[Corpora-List] A Lemmatizer is Required
Ciarán Ó Duibhín
ciaran at oduibhin.freeserve.co.uk
Tue Mar 4 03:24:10 UTC 2008
> I have used TreeTagger but it required tokanized words i.e. each word in a new line
This is true of the basic TreeTagger program, but the Windows distribution contains tools to tokenize the input.
If you want to run TreeTagger from the Windows command-line, do so using the supplied batch file, which calls a perl script to tokenize the input. If you want to run TreeTagger from the Windows graphic interface, just tick the checkbox labelled "built-in tokenization".
Ciarán Ó Duibhín.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20080304/ef3a7d54/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list