[Corpora-List] ACL/IJCNLP-2009 Workshop - Call for Participation - The People's Web meets NLP: Collaboratively Constructed Semantic Resources

Fri Jul 24 09:11:22 UTC 2009

ACL/IJCNLP-2009 Workshop

"The People's Web meets NLP:
Collaboratively Constructed Semantic Resources"

Co-located with Joint conference of the 47th Annual Meeting of the
Association for Computational Linguistics and the 4th International
Joint Conference on Natural Language Processing of the Asian
Federation of Natural Language Processing

Singapore
August 7th, 2009
http://www.ukp.tu-darmstadt.de/acl-ijcnlp-2009-workshop/

LIST OF ACCEPTED PAPERS

* A Novel Approach to Automatic Gazetteer Generation using Wikipedia
Ziqi Zhang and Jose Iria

* Named Entity Recognition in Wikipedia
Dominic Balasuriya, Nicky Ringland, Joel Nothman, Tara Murphy and 
James R. Curran

* Wiktionary for Natural Language Processing: Methodology and Limitations
Emmanuel Navarro, Franck Sajous, Bruno Gaume, Laurent Prévot,
ShuKai Hsieh, Ivy Kuo, Pierre Magistry and Chu-Ren Huang

* Using the Wiktionary Graph Structure for Synonym Detection
Timothy Weale, Chris Brew and Eric Fosler-Lussier

* Automatic Content-Based Categorization of Wikipedia Articles
Zeno Gantner and Lars Schmidt-Thieme

* Evaluating a Statistical CCG Parser on Wikipedia
Matthew Honnibal, Joel Nothman and James R. Curran

* Construction of Disambiguated Folksonomy Ontologies Using Wikipedia
Noriko Tomuro and Andriy Shepitsen

* Acquiring High Quality Non-Expert Knowledge from On-Demand Workforce
Donghui Feng, Sveva Besana and Remi Zajac

* Constructing an Anaphorically Annotated Corpus with Non-Experts:
Assessing the Quality of Collaborative Annotations
Jon Chamberlain, Udo Kruschwitz and Massimo Poesio

INVITED TALK

Speaker: Rada Mihalcea, University of North Texas
Title: Large Scale Semantic Annotations Using Encyclopedic Knowledge

Abstract:
Wikipedia is an online encyclopedia that has grown to become one of the
largest online repositories of encyclopedic knowledge, with millions of
articles available for a large number of languages. In fact, Wikipedia
editions are available for more than 200 languages, with a number of
entries varying from a few pages to more than one million articles per
language.

In this talk, I will describe the use of Wikipedia as a source of
linguistic evidence for large scale semantic annotations. In particular,
I will show how this online encyclopedia can be used to achieve
state-of-the-art results on two text processing tasks: automatic keyword
extraction and word sense disambiguation. I will also show how the two
methods can be combined into a system able to automatically enrich a text
with links to encyclopedic knowledge. Given an input document, the system
identifies the important concepts in the text and automatically links
these concepts to the corresponding Wikipedia pages. Evaluations of the
system showed that the automatic annotations are reliable and hardly
distinguishable from manual annotations. Additionally, an evaluation of
the system in an educational environment showed that the availability of
encyclopedic knowledge within easy reach of a learner can improve both 
the quality of the knowledge acquired and the time needed to obtain
such knowledge.

Short bio:
Rada Mihalcea is an Associate Professor of Computer Science at the
University of North Texas. Her research interests are in lexical semantics,
graph-based algorithms for natural language processing, and multilingual
natural language processing. During 2004-2007, she acted as the president
of the ACL Special Group on the Lexicon, and she serves or has served on
the editorial boards of the Journals of Computational Linguistics, Language
Resources and Evaluations, Natural Language Engineering, and Research in
Language in Computation. She is the recipient of a National Science
Foundation CAREER award.

INTRODUCTION

In recent years, online resources collaboratively constructed by ordinary
users on the Web have considerably influenced the NLP community. In many
works, they have been used as a substitute for conventional semantic
resources and as semantically structured corpora with great success.
While conventional resources such as WordNet are developed by trained
linguists [1], online semantic resources can now be automatically
extracted from the content collaboratively created by the users [2].
Thereby, the knowledge acquisition bottlenecks and coverage problems
pertinent to conventional lexical semantic resources can be overcome.

The resource that has gained the greatest popularity in this respect
so far is Wikipedia. However, other resources recently discovered in
NLP, such as folksonomies, the multilingual collaboratively
constructed dictionary Wiktionary, or Q&A sites like WikiAnswers or
Yahoo! Answers are also very promising. Moreover, new wiki-based
platforms such as Citizendium or Knol have recently emerged that
offer features distinct from Wikipedia and are of high potential
in terms of their use in NLP.

The benefits of using Web-based resources come along with new
challenges, such as the interoperability with existing resources and
the quality of the knowledge represented. As collaboratively created
resources lack editorial control, they are typically incomplete. For
the interoperability with conventional resources, the mappings have
to be investigated. The quality of collaboratively constructed
resources is questioned in many cases, and the information extraction
remains a complicated task due to the incompleteness and semi-
structuredness of the content. Therefore, the research community has
begun to develop and provide tools for accessing collaboratively
constructed resources [2,5].

The above listed challenges actually present a chance for NLP
techniques to improve the quality of Web-based semantic resources.
Researchers have therefore proposed techniques for link prediction [3]
or information extraction [4] that can be used to guide the "crowds"
to construct resources that are better suited for being used in NLP
in return.

[1] Christiane Fellbaum
    WordNet An Electronic Lexical Database.
    MIT press, 1998.
[2] Torsten Zesch, Christof Mueller and Iryna Gurevych
    Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary
    Proceedings of the Conference on Language Resources and Evaluation
    (LREC), 2008.
    http://www.ukp.tu-darmstadt.de/software/jwpl/
    http://www.ukp.tu-darmstadt.de/software/jwktl/
[3] Rada Mihalcea and Andras Csomai
    Wikify!: Linking Documents to Encyclopedic Knowledge.
    Proceedings of the Sixteenth ACM Conference on Information and
    Knowledge Management, CIKM 2007.
[4] Daniel S. Weld et al.
    Intelligence in Wikipedia.
    Twenty-Third Conference on Artificial Intelligence (AAAI), 2008.
[5] Kotaro Nakayama et al.
    Wikipedia Mining - Wikipedia as a Corpus for Knowledge Extraction.
    Proceedings of the Annual Wikipedia Conference (Wikimania), 2008.
    http://wikipedia-lab.org/en/index.php

TOPICS

The workshop will bring together researchers from both worlds: those
using collaboratively created resources in NLP applications such as
information retrieval, named entity recognition, or keyword extraction,
and those using NLP applications for improving the resources or
extracting different types of semantic information from them. Hopefully,
this will turn into a feedback loop, where NLP techniques improved by
collaboratively constructed resources are used to improve the resources
in exchange.

ORGANIZERS

Iryna Gurevych
Torsten Zesch

Ubiquitous Knowledge Processing Lab
Technical University of Darmstadt, Germany

PROGRAM COMMITTEE

Delphine Bernhard   Technische Universiaet Darmstadt
Paul Buitelaar      DERI, National University of Ireland, Galway
Razvan Bunescu      University of Texas at Austin
Pablo Castells      Universidad Autononoma de Madrid
Philipp Cimiano     Karlsruhe University
Irene Cramer        Dortmund University of Technology
Andras Csomai       Google Inc.
Ernesto De Luca     University of Magdeburg
Roxana Girju        University of Illinois at Urbana-Champaign
Andreas Hotho       University of Kassel
Graeme Hirst        University of Toronto
Ed Hovy             University of Southern California
Jussi Karlgren      Swedish Institute of Computer Science
Boris Katz          Massachusetts Institute of Technology
Adam Kilgarriff     Lexical Computing Ltd
Chin-Yew Lin        Microsoft Research
James Martin        University of Colorado Boulder
Olena Medelyan      University of Waikato
David Milne         University of Waikato
Saif Mohammad       University of Maryland
Dan Moldovan        University of Texas at Dallas
Kotaro Nakayama     University of Tokyo
Ani Nenkova         University of Pennsylvania
Guenter Neumann     DFKI Saarbruecken
Maarten de Rijke    University of Amsterdam
Magnus Sahlgren     Swedish Institute of Computer Science
Manfred Stede       Potsdam University
Benno Stein         Bauhaus University Weimar
Tonio Wandmacher    University of Osnabrueck
Rene Witte          Concordia University Montreal
Hans-Peter Zorn     European Media Lab, Heidelberg

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora