[Corpora-List] Looking for an annotated corpus

Adam Przepiorkowski adamp at ipipan.waw.pl
Sat Aug 20 17:35:44 UTC 2011


denis <lebaillydenis at gmail.com>:

> I'm Looking for a corpus providing named entities annotations
> (essentially person, company and organization tags) to perform
> evaluation on a named entity extractor.

I suppose you are interested in the Default Language, but if Polish is
also of interest to you, there is a new 1-million word manually
annotated corpus of Polish, a subcorpus of the National Corpus of
Polish.  Actually, NEs are only one of several levels of annotation.
You can get the corpus from http://clip.ipipan.waw.pl/LRT
(cf. NKJP-PodkorpusMilionowy-1.0.tgz).  It's available on the GNU GPL
v.3 licence.

> Could you help me to choose what's the best to use?

Well, to the best of my knowledge this is the only such a publicly
available corpus for Polish at the moment, so it must be the best ;-)

Best,

Adam P.

-- 
Adam Przepiórkowski                          ˈadam ˌpʃɛpjurˈkɔfskʲi
http://clip.ipipan.waw.pl/ ____ Computational Linguistics in Poland
http://nlp.ipipan.waw.pl/ ____________ Linguistic Engineering Group
http://nkjp.pl/ _________________________ National Corpus of Polish

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list