[Corpora-List] Natural Language Toolkit version 0.7 released

Steven Bird sb at csse.unimelb.edu.au
Sat Dec 30 23:09:53 UTC 2006


NLTK, the Natural Language Toolkit, is a suite of program modules,
data sets and tutorials supporting research and teaching in
computational linguistics and natural language processing.

NLTK-Lite 0.7 is now available.  It includes of 48,000 lines of Python
code, 280 pages of textbook documentation, and 20 sample corpora,
distributed under an open source license.

Version 0.7 requires Python 2.4 or later, and Python's newly released
numerical library Numpy 1.0.  Installation instructions for all
platforms, and an ISO image with distributions of NLTK-Lite, Python,
Numpy and WordNet are available at:

   http://nltk.sourceforge.net/

Changes since version 0.6 (Dec 2005):

Code:
- new semantic interpretation package (Ewan Klein),
- new support for SIL Toolbox format (Greg Aumann),
- new chunking package including cascaded
 chunking and improved evaluation (Steven Bird),
- interface to version 2.1 of Wordnet (adapting
   Oliver Steele's pywordnet),
- WordNet similarity measures (David Ormiston Smith)
 - path distance, Wu & Palmer, Leacock & Chodorow, Resnik
- clustering package
- support for full Penn treebank format (Yoav Goldberg)

Corpora:
- added stopwords corpus, names corpus
- added sample from TIMIT corpus
- added Senseval 2

Documentation:
- substantial updates throughout (especially
 intro to programming, structured programming,
 chunk parsing, advanced parsing, feature-based grammar,
 and semantics)

Distributions:
- binary and source distributions for Windows, Mac, and Unix
- 20 corpora and corpus samples, plus corpus readers
- textbook, plus extensive API documentation
- ISO CD-ROM image including all the above, plus:
 Python 2.5, Numpy 1.0, and Wordnet 2.1 distributions
 for all platforms together with installation instructions

-Steven Bird



More information about the Corpora mailing list