[Corpora-List] Looking for an annotated corpus
Adam Przepiorkowski
adamp at ipipan.waw.pl
Sat Aug 20 17:35:44 UTC 2011
denis <lebaillydenis at gmail.com>:
> I'm Looking for a corpus providing named entities annotations
> (essentially person, company and organization tags) to perform
> evaluation on a named entity extractor.
I suppose you are interested in the Default Language, but if Polish is
also of interest to you, there is a new 1-million word manually
annotated corpus of Polish, a subcorpus of the National Corpus of
Polish. Actually, NEs are only one of several levels of annotation.
You can get the corpus from http://clip.ipipan.waw.pl/LRT
(cf. NKJP-PodkorpusMilionowy-1.0.tgz). It's available on the GNU GPL
v.3 licence.
> Could you help me to choose what's the best to use?
Well, to the best of my knowledge this is the only such a publicly
available corpus for Polish at the moment, so it must be the best ;-)
Best,
Adam P.
--
Adam Przepiórkowski ˈadam ˌpʃɛpjurˈkɔfskʲi
http://clip.ipipan.waw.pl/ ____ Computational Linguistics in Poland
http://nlp.ipipan.waw.pl/ ____________ Linguistic Engineering Group
http://nkjp.pl/ _________________________ National Corpus of Polish
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list