[Corpora-List] Natural Language Toolkit: NLTK-Lite version 0.6.5 released

Markus Heller markus at relix.de
Mon Jul 10 23:53:08 UTC 2006


Dear Corpora Community,

I recently saw that the tokenizer from the nltk package requires a good regex. 
Does anybody have a reasonable regex for this package which can produce 
decent tokens from modern texts, preferably German texts? I have tried out 
the ones on the tutorial pages but I see a common package user is required to 
develop his own regex for tokenizing purposes. Are there good (free) 
tokenizer regexes around for this package? 

Thanks in advance,
Markus



More information about the Corpora mailing list