[Corpora-List] BBN Named Entity annotation of 15 million word OANC now available

Nancy Ide ide at cs.vassar.edu
Tue Nov 9 17:03:22 UTC 2010


                *******************************************************************
                BBN Named Entity annotation of the 15 million word Open ANC
                *******************************************************************
                                            http://www.anc.org

The American National Corpus (ANC) project has received a contribution of named entity annotation 
for the entire 15 million words of the Open American National Corpus, which is now freely 
available for download from the ANC website. The annotations were automatically produced 
by the BBN named entity tagger (see http://www.ldc.upenn.edu/acl/W/W03/W03-1506.pdf) 
and contributed by Sameer Pradhan. The download contains the OANC texts, respecting the 
OANC directory structure, with inline annotations in an XML-like format. 

The ANC project is in the process of generating a version of these annotations in standoff GrAF 
format so that they may be combined with other OANC annotations using the ANC2Go web 
application http://www.anc.org:8080/ANC2Go) or the stand-alone ANCTool. 

The ANC welcomes contributions of both annotations and texts, which we release for
free download by the community from our website. ANC, OANC, and MASC data and annotations are 
or will be also distributed through the Linguistic Data Consortium. To contribute, send email to
anc at anc.org or consult http://www.anc.org/contribute.html.

==============================================================================
THE ANC PROJECT IS COMMITTED TO OPEN DATA FOR LANGUAGE RESEARCH, DEVELOPMENT, 
AND EDUCATION. ALL CONTRIBUTIONS OF BOTH DATA AND ANNOTATIONS SHOULD  BE 
UNENCUMBERED BY LICENSING RESTRICTIONS. ALL CONTRIBUTIONS ARE MADE FREELY AVAILABLE
FOR USE BY THE COMMUNITY.
=============================================================================== 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20101109/f6204fdb/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list