[Corpora-List] Natural Language Toolkit: NLTK-Lite version 0.6.5 released
Markus Heller
markus at relix.de
Mon Jul 10 23:53:08 UTC 2006
Dear Corpora Community,
I recently saw that the tokenizer from the nltk package requires a good regex.
Does anybody have a reasonable regex for this package which can produce
decent tokens from modern texts, preferably German texts? I have tried out
the ones on the tutorial pages but I see a common package user is required to
develop his own regex for tokenizing purposes. Are there good (free)
tokenizer regexes around for this package?
Thanks in advance,
Markus
More information about the Corpora
mailing list