[Corpora-List] CoNLL 2011 Shared Task -- Call for Participation

Erik Tjong Kim Sang erikt at xs4all.nl
Mon Dec 6 15:58:06 UTC 2010


-------------------------------------------------------------------------------
CoNLL 2011 Shared Task -- Call for Participation
================================================

Task: Modeling Unrestricted Coreference in OntoNotes
----------------------------------------------------

The importance of coreference resolution for the entity/event
detection task, namely identifying all mentions of entities and events
in text and clustering them into equivalence classes, has been well
recognized in the natural language processing community.  Automatic
identification of coreferring entities and events in text has been an
uphill battle for several decades, partly because it can require world
knowledge which is not well-defined and partly owing to the lack of
substantial annotated data.

The OntoNotes project (http://www.bbn.com/ontonotes/) -- a
collaborative effort between BBN Technologies, University of Colorado,
University of Southern California (ISI), University of Pennsylvania
and Brandeis University - has created a large-scale, accurate corpus
for general anaphoric coreference that covers entities and events not
limited to noun phrases or a limited set of entity types. The
Linguistic Data Consortium (LDC) has agreed to make it freely
available to the research community. The coreference layer in
OntoNotes constitutes one part of a multi-layer, integrated annotation
of shallow semantic structure in text with high inter-annotator
agreement. In addition to coreference, this data is also tagged with
syntactic trees, high coverage verb and some noun propositions,partial
verb and noun word senses, and 18 name entity types. It provides a
good opportunity for performing joint inference over a substantial set
of data.

The task will be automatic anaphoric mention detection and coreference
resolution using the English language portion of the OntoNotes 4.0
data given predicted information on the other layers.  The training
data will contain both gold standard and predicted annotations, but
only predicted annotations will be provided with the test
material. The corpus comprises a little over one million words from
newswire (~450k), magazine articles (~150k) broadcast news (~200k),
broadcast conversations (~200k), and web data (~200k).  The test set
would comprise parts of all the genre so as to evaluate out of domain
effects. In its current year, OntoNotes is currently annotating
conversational speech transcripts from CallHome; while these will not
be part of release 4.0, they could be used as an additional genre in
the test set.  More information about the task can be found at:
http://conll.bbn.com

The task will have the customary CoNLL two challenges, "open" and
"closed".  The former will allow for almost unrestricted use of
external resources to complement the provided data, while the latter
will be restricted to the official training data with a limited,
pre-specified set of additional resources, including WordNet, and a
pre-computed list of number and gender information.

CoNLL 2011 will be held in conjunction with HLT/ACL in Portland,
Oregon, USA, June 19-24, 2011.


Important Dates
---------------

January 15: Trial datasets (plus documentation and scorer) available
January 21: Task registration deadline (including corpora license forms)
February 1: Training and development sets available
    April 1: Test set available
    April 8: Systems' outputs due
   April 15: Deadline for paper submission
   April 29: Notification of acceptance
      May 6: Deadline for camera ready papers
June 23-24: CoNLL conference, Portland, Oregon


In order to receive future calls and other information about the
shared task, participants should register their intent to participate,
in either or both of the two tracks, by sending an e-mail to
conll-2011-st at bbn.com. Although the deadline for registration is not
until January 21, 2011, we recommend participants to register as early
as possible, in order not to miss any information.


Organizers
----------

Sameer Pradhan (Chair) Raytheon BBN Technologies, Cambridge, MA
Mitchell Marcus, University of Pennsylvania, Philadelphia, PA
Martha Palmer, University of Colorado, Boulder, CO
Lance Ramshaw, Raytheon BBN Technologies, Cambridge, MA
Ralph Weischedel, Raytheon BBN Technologies, Cambridge, MA
Nianwen Xue, Brandeis University, Waltham, MA


Contact
-------

Questions about the CoNLL 2011 shared task can be sent to
conll-2011-st at bbn.com

More information about the task can be found at: http://conll.bbn.com
-------------------------------------------------------------------------------

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list