[Corpora-List] Annoucement: jTokeniser v2.0 released

Andy Roberts andyr at comp.leeds.ac.uk
Sun Jul 16 20:51:19 UTC 2006


Dear Corpora List readers,

I'm happy to announce that I've just released a new version of the
jTokeniser library.

As some may recall, jTokeniser comprises of 6 tokenisers ranging from
basic to powerful, and they were accessible in a very simple Java API.
Tokenisers include:

* WhiteSpaceTokeniser
* StringTokeniser (based on specified delimiters)
* RegexTokeniser (regular expression defines a token)
* RegexSeparatorTokeniser (define what is *not* a token)
* BreatIteratorTokeniser (sophisticated locale-specific tokeniser)
* SentenceTokeniser (sentence segmentation)

jTokeniser v2.0 makes no changes to the core tokenisers themselves, but
adds a nice GUI front-end to the library to allow users to experiment
with the tokenisers interactively.

This should appeal to those who perhaps don't have the programming
experience in Java to utilise the library in its intended form. It also 
makes it ideal for use within a learning context, such as an NLP course.

For all information about downloading, installing, running and using
jTokeniser v2.0, please visit the project website (screenshots included):

http://www.andy-roberts.net/software/jTokeniser

Any comments or feature suggestions are welcome.

Regards, 
Andy Roberts



More information about the Corpora mailing list