[Corpora-List] NLTK-Lite version 0.9.3 has been released
Steven Bird
sb at csse.unimelb.edu.au
Sat Jun 14 00:00:58 UTC 2008
NLTK-Lite version 0.9.3 has been released -- http://nltk.org/
NLTK -- the Natural Language Toolkit -- is a suite of open source
Python modules, data and documentation for research and development in
natural language processing. NLTK contains code supporting dozens of
NLP tasks, along with 40 popular corpora and extensive documentation
including a 375-page online book. The toolkit has been used in 60+
university courses in over 20 countries, and is in the top 0.1% of SourceForge
projects (32,000 downloads in the past 12 months). Distributions for
Mac, Windows and Linux are available.
Contents: ~100k lines of Python code and 850Mb of data:
Corpora: POS-tagged corpora including Brown Corpus; text corpora;
PP attachment, named entity, WSD, TIMIT sample, Propbank,
Movie reviews, Question classification, Reuters, Senseval,
RTE, Treebanks in several languages, WordNet, VerbNet,...
together with corpus readers for convenient and efficient access
Tokenizers: whitespace, newline, blankline, word, wordpunct,
treebank, regexp, Punkt sentence segmenter
Stemmers: Porter, Lancaster, regexp
Taggers: regexp, n-gram, backoff, Brill, HMM
Parsers: recursive descent, shift-reduce, chunk, chart,
feature-based, probabilistic, ...
Semantic interpretation: untyped lambda calculus,
first-order models, parser interface
Wordnet: wordnet interface, lexical relations, similarity
Classifiers: decision tree, maximum entropy, naive Bayes
Clusterers: expectation maximization, agglomerative, k-means
Evaluation: accuracy, precision, recall, F-measure, windowdiff
Estimation: uniform, maximum likelihood, Lidstone, Laplace,
expected likelihood, heldout, cross-validation, Good-Turing, Witten-Bell
Miscellaneous: feature detection, unification, chatbots, many utilities
Interfaces: Weka, MEGAM, Prover9/Mace4, Shoebox/Toolbox, BNC
http://nltk.org/
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list