[Corpora-List] Natural Language Toolkit: NLTK-Lite version 0.6.5 released
Steven Bird
sb at csse.unimelb.edu.au
Mon Jul 10 11:35:48 UTC 2006
NLTK, the Natural Language Toolkit, is a suite of Python libraries and
programs for natural language processing. Version 0.6.5 has been
released, and can be downloaded from http://nltk.sourceforge.net/
CONTENTS
Software Modules: corpus readers, tokenizers & stemmers, taggers
(regexp, n-gram, backoff, Brill, HMM), parsers (recursive descent,
shift-reduce, chart, probabilistic, ...), clusterers (EM, k-means,
...), probability distributions, chatbots, demonstrations, ...
Corpora and Corpus Samples: Brown Corpus, CMU Pronunciation
Dictionary, CoNNL-2000, Genesis, Gutenberg, IEER, Presidential
Addresses, Names, PP-Attachment, Senseval 2, TIMIT, Treebank, Words
Documentation: Tutorials and exercises (190pp), API documentation for
all software modules, installation instructions for Windows, Mac,
Unix.
ChangeLog for Version 0.6.5 2006-07-09
* Code:
- improvements to shoebox module (Stuart Robinson, Greg Aumann)
- incorporated feature-based parsing into core NLTK-Lite
- corpus reader for Sinica treebank sample
- new stemmer package
* Contrib:
- hole semantics implementation (Peter Wang)
- Incorporating yaml
- new work on feature structures, unification, lambda calculus
- new work on shoebox package (Stuart Robinson, Greg Aumann)
* Corpora:
- Sinica treebank sample
* Tutorials:
- expanded discussion throughout, incl: left-recursion, trees, grammars,
feature-based grammar, agreement, unification, PCFGs,
baseline performance, exercises, improved display of trees
-Steven Bird
More information about the Corpora
mailing list