[Corpora-List] Free Java Tokenizer for english

Alexandre Rafalovitch arafalov at gmail.com
Thu Nov 20 17:00:26 UTC 2008


I don't believe there is a fully consistent agreement on tokenization
rules for English (e.g. "don't"), but have a look at:
http://www.andy-roberts.net/software/jTokeniser/
and
http://www.gate.ac.uk/

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
Research group: http://www.clt.mq.edu.au/Research/

On Thu, Nov 20, 2008 at 11:41 AM, ben dbabis samira
<bendbabis_samira at yahoo.fr> wrote:
> Hi,
> Does anyone knows references of free tokenizers implemented with Java for
> english texts?
> Thanks for help

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list