[Corpora-List] English POS tagged corpus
Eric Atwell
eric at comp.leeds.ac.uk
Fri Nov 19 15:49:06 UTC 2004
Gaurav,
The SourceForge open-source Python Natural Language Toolkit (NLTK)
http://nltk.sourceforge.net/
is a student-oriented teaching resource with a bundle of corpus and
lexical resources including PoS-tagged Brown corpus of US English:
20_newsgroups genesis lexicon roget treebank
brown gutenberg names semcor1.7 treebank_swb
chunking ieer nltk-data-0.3 senseval wordnet
cmp-lg levin ppattach stopwords words
It also comes with demo software and easy-to-follow tutorials and
API documentation for tokenization, tagging, parsing, and probabilistic
modelling. As it's open-source, new contributions keep on coming;
eg latest News says "Christopher Maloof's implementation of the Brill
tagger has been added to the development version of NLTK".
Of course, other tagged corpora are available from ICAME, LDC, ELRA etc
but you may have to pay, and they dont come with demo software/tutorials
(admittedly you didnt say you wanted any associatied software/tutorials
:-)
hope this helps
Eric
-
Eric Atwell, Senior Lecturer, Computer Vision and Language research group,
School of Computing, University of Leeds, LEEDS LS2 9JT, England
TEL: +44-113-2335430 FAX: +44-113-2335468 http://www.comp.leeds.ac.uk/eric
On Fri, 19 Nov 2004, Gaurav Malhotra wrote:
> Hi,
> Is there an English Parts-of-Speech corpus available for download on the internet. I will be very grateful.
> Gaurav Malhotra
>
>
> ---------------------------------
> Do you Yahoo!?
> The all-new My Yahoo! Get yours free!
More information about the Corpora
mailing list