[Corpora-List] NLTK-Lite version 0.9.3 has been released

Steven Bird sb at csse.unimelb.edu.au
Sat Jun 14 00:00:58 UTC 2008


NLTK-Lite version 0.9.3 has been released -- http://nltk.org/

NLTK -- the Natural Language Toolkit -- is a suite of open source
Python modules, data and documentation for research and development in
natural language processing. NLTK contains code supporting dozens of
NLP tasks, along with 40 popular corpora and extensive documentation
including a 375-page online book.  The toolkit has been used in 60+
university courses in over 20 countries, and is in the top 0.1% of SourceForge
projects (32,000 downloads in the past 12 months).  Distributions for
Mac, Windows and Linux are available.

Contents: ~100k lines of Python code and 850Mb of data:

Corpora: POS-tagged corpora including Brown Corpus; text corpora;
   PP attachment, named entity, WSD, TIMIT sample, Propbank,
   Movie reviews, Question classification, Reuters, Senseval,
   RTE, Treebanks in several languages, WordNet, VerbNet,...
   together with corpus readers for convenient and efficient access
Tokenizers: whitespace, newline, blankline, word, wordpunct,
   treebank, regexp, Punkt sentence segmenter
Stemmers: Porter, Lancaster, regexp
Taggers: regexp, n-gram, backoff, Brill, HMM
Parsers: recursive descent, shift-reduce, chunk, chart,
   feature-based, probabilistic, ...
Semantic interpretation: untyped lambda calculus,
   first-order models, parser interface
Wordnet: wordnet interface, lexical relations, similarity
Classifiers: decision tree, maximum entropy, naive Bayes
Clusterers: expectation maximization, agglomerative, k-means
Evaluation: accuracy, precision, recall, F-measure, windowdiff
Estimation: uniform, maximum likelihood, Lidstone, Laplace,
   expected likelihood, heldout, cross-validation, Good-Turing, Witten-Bell
Miscellaneous: feature detection, unification, chatbots, many utilities
Interfaces: Weka, MEGAM, Prover9/Mace4, Shoebox/Toolbox, BNC

http://nltk.org/

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list