[Corpora-List] A Lemmatizer is Required

Ciarán Ó Duibhín ciaran at oduibhin.freeserve.co.uk
Tue Mar 4 03:24:10 UTC 2008


> I have used TreeTagger but it required tokanized words i.e. each word in a new line

This is true of the basic TreeTagger program, but the Windows distribution contains tools to tokenize the input.

If you want to run TreeTagger from the Windows command-line, do so using the supplied batch file, which calls a perl script to tokenize the input.  If you want to run TreeTagger from the Windows graphic interface, just tick the checkbox labelled "built-in tokenization".

Ciarán Ó Duibhín.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20080304/ef3a7d54/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list