Ressources: Syntactic annotations and co-reference annotations now avilable for the OANC

Thierry Hamon thierry.hamon at UNIV-PARIS13.FR
Wed Nov 10 19:57:58 UTC 2010


Date: Wed, 10 Nov 2010 10:12:29 -0500
From: Nancy Ide <ide at cs.vassar.edu>
Message-Id: <93C5BB33-71D5-4138-843F-21150E0202E8 at cs.vassar.edu>
X-url: http://www.anc.org:8080/ANC2Go
X-url: http://www.anc.org/annotations.html
X-url: http://www.anc.org/contribute.html
X-url: http://www.aclweb.org/anthology-new/N/N04/N04-1043.pdf.

 *******************************************************************
   Three syntactic annotations of 11 million words of the  Open ANC
 *******************************************************************

The American National Corpus (ANC) project has received a contribution
of three syntactic parses for 11 million of the 15 million words of
the Open American National Corpus, which are now freely available for
download from the ANC website. The annotations were automatically
produced using the Charniak & Johnson (2005) parser, the MaltParser
(Nivre et al., 2007), and the LHT dependency converter (Johansson &
Nugues, 2007). The annotations were contributed by Rasul Kalajahi.

The download contains the input to and output from each parser, in
Penn Treebank and CONLL formats.  The ANC project is in the process of
generating a version of these annotations in standoff GrAF format so
that they may be combined with other OANC annotations using the ANC2Go
web application http://www.anc.org:8080/ANC2Go) or the stand-alone
ANCTool.

************************************************************************
Manually-generated coreference annotations of 128K words of the Open
ANC
************************************************************************

Shane Bergsma of the University of Alberta has annotated a sub-set of
the Slate journal data for coreference (anaphora). The annotations
consist of pronoun-antecedent pairs in 118 documents (128717 words)
from the Slate data of the ANC/OANC. The data include a test set and a
training set; there are 1398 labeled pronouns in 78 documents in the
training set and 1381 labeled pronouns in 40 documents in the test
set.

At present these annotations are provided as a separate corpus in the
standoff XCES format used for the ANC First and Second releases and
the current version of the OANC (a release of he OANC in GrAF format,
which will supersede the current XCES format, will be available at the
end of this month). A GrAF version of the coreference annotations is
also being produced.

All annotations of the OANC are available at
http://www.anc.org/annotations.html

------------------------------------------------------------------------
The ANC welcomes contributions of annotations, texts, and derived
data, which we release for free download by the community from our
website. ANC, OANC, and MASC data and annotations are or will be also
available through the Linguistic Data Consortium. To contribute, send
email to anc at anc.org or consult http://www.anc.org/contribute.html.

 =======================================================================
THE ANC PROJECT IS COMMITTED TO OPEN DATA FOR LANGUAGE RESEARCH,
DEVELOPMENT, AND EDUCATION. ALL CONTRIBUTIONS OF BOTH DATA AND
ANNOTATIONS SHOULD BE UNENCUMBERED BY LICENSING RESTRICTIONS. ALL
CONTRIBUTIONS ARE MADE FREELY AVAILABLE FOR USE BY THE COMMUNITY.
 =======================================================================

-------------------------------------------------------------------------
Message diffuse par la liste Langage Naturel <LN at cines.fr>
Informations, abonnement : http://www.atala.org/article.php3?id_article=48
English version       : 
Archives                 : http://listserv.linguistlist.org/archives/ln.html
                                http://liste.cines.fr/info/ln

La liste LN est parrainee par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhesion  : http://www.atala.org/
-------------------------------------------------------------------------



More information about the Ln mailing list