[Corpora-List] ACL/IJCNLP-2009 Workshop - Call for Participation - The People's Web meets NLP: Collaboratively Constructed Semantic Resources
Torsten Zesch
zesch at tk.informatik.tu-darmstadt.de
Fri Jul 24 09:11:22 UTC 2009
ACL/IJCNLP-2009 Workshop
"The People's Web meets NLP:
Collaboratively Constructed Semantic Resources"
Co-located with Joint conference of the 47th Annual Meeting of the
Association for Computational Linguistics and the 4th International
Joint Conference on Natural Language Processing of the Asian
Federation of Natural Language Processing
Singapore
August 7th, 2009
http://www.ukp.tu-darmstadt.de/acl-ijcnlp-2009-workshop/
LIST OF ACCEPTED PAPERS
* A Novel Approach to Automatic Gazetteer Generation using Wikipedia
Ziqi Zhang and Jose Iria
* Named Entity Recognition in Wikipedia
Dominic Balasuriya, Nicky Ringland, Joel Nothman, Tara Murphy and
James R. Curran
* Wiktionary for Natural Language Processing: Methodology and Limitations
Emmanuel Navarro, Franck Sajous, Bruno Gaume, Laurent Prévot,
ShuKai Hsieh, Ivy Kuo, Pierre Magistry and Chu-Ren Huang
* Using the Wiktionary Graph Structure for Synonym Detection
Timothy Weale, Chris Brew and Eric Fosler-Lussier
* Automatic Content-Based Categorization of Wikipedia Articles
Zeno Gantner and Lars Schmidt-Thieme
* Evaluating a Statistical CCG Parser on Wikipedia
Matthew Honnibal, Joel Nothman and James R. Curran
* Construction of Disambiguated Folksonomy Ontologies Using Wikipedia
Noriko Tomuro and Andriy Shepitsen
* Acquiring High Quality Non-Expert Knowledge from On-Demand Workforce
Donghui Feng, Sveva Besana and Remi Zajac
* Constructing an Anaphorically Annotated Corpus with Non-Experts:
Assessing the Quality of Collaborative Annotations
Jon Chamberlain, Udo Kruschwitz and Massimo Poesio
INVITED TALK
Speaker: Rada Mihalcea, University of North Texas
Title: Large Scale Semantic Annotations Using Encyclopedic Knowledge
Abstract:
Wikipedia is an online encyclopedia that has grown to become one of the
largest online repositories of encyclopedic knowledge, with millions of
articles available for a large number of languages. In fact, Wikipedia
editions are available for more than 200 languages, with a number of
entries varying from a few pages to more than one million articles per
language.
In this talk, I will describe the use of Wikipedia as a source of
linguistic evidence for large scale semantic annotations. In particular,
I will show how this online encyclopedia can be used to achieve
state-of-the-art results on two text processing tasks: automatic keyword
extraction and word sense disambiguation. I will also show how the two
methods can be combined into a system able to automatically enrich a text
with links to encyclopedic knowledge. Given an input document, the system
identifies the important concepts in the text and automatically links
these concepts to the corresponding Wikipedia pages. Evaluations of the
system showed that the automatic annotations are reliable and hardly
distinguishable from manual annotations. Additionally, an evaluation of
the system in an educational environment showed that the availability of
encyclopedic knowledge within easy reach of a learner can improve both
the quality of the knowledge acquired and the time needed to obtain
such knowledge.
Short bio:
Rada Mihalcea is an Associate Professor of Computer Science at the
University of North Texas. Her research interests are in lexical semantics,
graph-based algorithms for natural language processing, and multilingual
natural language processing. During 2004-2007, she acted as the president
of the ACL Special Group on the Lexicon, and she serves or has served on
the editorial boards of the Journals of Computational Linguistics, Language
Resources and Evaluations, Natural Language Engineering, and Research in
Language in Computation. She is the recipient of a National Science
Foundation CAREER award.
INTRODUCTION
In recent years, online resources collaboratively constructed by ordinary
users on the Web have considerably influenced the NLP community. In many
works, they have been used as a substitute for conventional semantic
resources and as semantically structured corpora with great success.
While conventional resources such as WordNet are developed by trained
linguists [1], online semantic resources can now be automatically
extracted from the content collaboratively created by the users [2].
Thereby, the knowledge acquisition bottlenecks and coverage problems
pertinent to conventional lexical semantic resources can be overcome.
The resource that has gained the greatest popularity in this respect
so far is Wikipedia. However, other resources recently discovered in
NLP, such as folksonomies, the multilingual collaboratively
constructed dictionary Wiktionary, or Q&A sites like WikiAnswers or
Yahoo! Answers are also very promising. Moreover, new wiki-based
platforms such as Citizendium or Knol have recently emerged that
offer features distinct from Wikipedia and are of high potential
in terms of their use in NLP.
The benefits of using Web-based resources come along with new
challenges, such as the interoperability with existing resources and
the quality of the knowledge represented. As collaboratively created
resources lack editorial control, they are typically incomplete. For
the interoperability with conventional resources, the mappings have
to be investigated. The quality of collaboratively constructed
resources is questioned in many cases, and the information extraction
remains a complicated task due to the incompleteness and semi-
structuredness of the content. Therefore, the research community has
begun to develop and provide tools for accessing collaboratively
constructed resources [2,5].
The above listed challenges actually present a chance for NLP
techniques to improve the quality of Web-based semantic resources.
Researchers have therefore proposed techniques for link prediction [3]
or information extraction [4] that can be used to guide the "crowds"
to construct resources that are better suited for being used in NLP
in return.
[1] Christiane Fellbaum
WordNet An Electronic Lexical Database.
MIT press, 1998.
[2] Torsten Zesch, Christof Mueller and Iryna Gurevych
Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary
Proceedings of the Conference on Language Resources and Evaluation
(LREC), 2008.
http://www.ukp.tu-darmstadt.de/software/jwpl/
http://www.ukp.tu-darmstadt.de/software/jwktl/
[3] Rada Mihalcea and Andras Csomai
Wikify!: Linking Documents to Encyclopedic Knowledge.
Proceedings of the Sixteenth ACM Conference on Information and
Knowledge Management, CIKM 2007.
[4] Daniel S. Weld et al.
Intelligence in Wikipedia.
Twenty-Third Conference on Artificial Intelligence (AAAI), 2008.
[5] Kotaro Nakayama et al.
Wikipedia Mining - Wikipedia as a Corpus for Knowledge Extraction.
Proceedings of the Annual Wikipedia Conference (Wikimania), 2008.
http://wikipedia-lab.org/en/index.php
TOPICS
The workshop will bring together researchers from both worlds: those
using collaboratively created resources in NLP applications such as
information retrieval, named entity recognition, or keyword extraction,
and those using NLP applications for improving the resources or
extracting different types of semantic information from them. Hopefully,
this will turn into a feedback loop, where NLP techniques improved by
collaboratively constructed resources are used to improve the resources
in exchange.
ORGANIZERS
Iryna Gurevych
Torsten Zesch
Ubiquitous Knowledge Processing Lab
Technical University of Darmstadt, Germany
PROGRAM COMMITTEE
Delphine Bernhard Technische Universiaet Darmstadt
Paul Buitelaar DERI, National University of Ireland, Galway
Razvan Bunescu University of Texas at Austin
Pablo Castells Universidad Autononoma de Madrid
Philipp Cimiano Karlsruhe University
Irene Cramer Dortmund University of Technology
Andras Csomai Google Inc.
Ernesto De Luca University of Magdeburg
Roxana Girju University of Illinois at Urbana-Champaign
Andreas Hotho University of Kassel
Graeme Hirst University of Toronto
Ed Hovy University of Southern California
Jussi Karlgren Swedish Institute of Computer Science
Boris Katz Massachusetts Institute of Technology
Adam Kilgarriff Lexical Computing Ltd
Chin-Yew Lin Microsoft Research
James Martin University of Colorado Boulder
Olena Medelyan University of Waikato
David Milne University of Waikato
Saif Mohammad University of Maryland
Dan Moldovan University of Texas at Dallas
Kotaro Nakayama University of Tokyo
Ani Nenkova University of Pennsylvania
Guenter Neumann DFKI Saarbruecken
Maarten de Rijke University of Amsterdam
Magnus Sahlgren Swedish Institute of Computer Science
Manfred Stede Potsdam University
Benno Stein Bauhaus University Weimar
Tonio Wandmacher University of Osnabrueck
Rene Witte Concordia University Montreal
Hans-Peter Zorn European Media Lab, Heidelberg
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list