[Corpora-List] CoNLL 2012 Shared Task -- Announcement

Erik Tjong Kim Sang erikt at xs4all.nl
Thu Nov 3 07:36:14 UTC 2011


-----------------------------------------------------------------------
CoNLL 2012 Shared Task -- Announcement
======================================

Modeling Multilingual Unrestricted Coreference in OntoNotes
-----------------------------------------------------------

CoNLL-2012, to be held jointly with EMNLP in conjunction with ACL
(Jeju, Korea, 12-14 July 2012), will continue the tradition of
including a shared task for natural language learning systems. The
2012 shared task will be modeling multilingual coreference. The
importance of coreference resolution for the entity/event detection
task, namely identifying all mentions of entities and events in text
and clustering them into equivalence classes, has been well recognized
in the natural language processing community. Automatic identification
of coreferring entities and events in text has been an uphill battle
for several decades, partly because it can require world knowledge
which is not well-defined and partly owing to the lack of substantial
annotated data.

The OntoNotes project (http://www.bbn.com/ontonotes/) -- a
collaborative effort between BBN Technologies, University of Colorado,
University of Southern California (ISI), University of Pennsylvania
and Brandeis University -- created a large-scale, accurate
multilingual corpus for general anaphoric coreference that covers
entities and events not limited to noun phrases or a limited set of
entity types. The Linguistic Data Consortium (LDC) has agreed to make
it freely available to the research community. The coreference layer
in OntoNotes constitutes one part of a multi-layer, integrated
annotation of shallow semantic structure in text with high
inter-annotator agreement. In addition to coreference, this data is
also tagged with syntactic trees, high coverage verb and some noun
propositions, partial verb and noun word senses, and rich set of named
entity types.

Modeling multilingual unrestricted coreference in the OntoNotes data
is the shared task for CoNLL-2012. This is an extension of the
CoNLL-2011 shared task and would involve automatic anaphoric mention
detection and coreference resolution across three languages --
English, Chinese and Arabic -- using OntoNotes v5.0 corpus, given
predicted information on the syntax, proposition, word sense and named
entity layers. The training data will contain both gold standard and
predicted annotations, but only predicted annotations will be provided
with the test material. The English and Chinese language portion
comprises roughly one million words per language from newswire,
magazine articles, broadcast news, broadcast conversations, web data
and conversational speech. The English corpus also contains a further
200k of the English translation of the New Testament. The Arabic
portion is smaller, comprising 300k of newswire articles.

More information about the task would soon be available on
http://conll.cemantix.org/2012



Organizers
----------

Sameer Pradhan (Chair) Raytheon BBN Technologies, Cambridge, MA
Alessandro Moschitti University of Trento, Italy
Nianwen Xue, Brandeis University, Waltham, MA



Advisory Committee
------------------

Mitchell Marcus, University of Pennsylvania, Philadelphia, PA
Martha Palmer, University of Colorado, Boulder, CO
Lance Ramshaw, Raytheon BBN Technologies, Cambridge, MA
Ralph Weischedel, Raytheon BBN Technologies, Cambridge, MA



Contact
-------

Questions about the CoNLL-2012 shared task can be sent to
conll-2012-st at cemantix.org



Important Dates
---------------

  January 15: Trial datasets (plus documentation and scorer) available
  January 29: Task registration deadline (including corpora license forms)
February  2: Training and development sets available
    April  1: Test set available
    April 12: Systems' outputs due
-----------------------------------------------------------------------

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list