[Corpora-List] Annoucement: jTokeniser v2.0 released
Andy Roberts
andyr at comp.leeds.ac.uk
Sun Jul 16 20:51:19 UTC 2006
Dear Corpora List readers,
I'm happy to announce that I've just released a new version of the
jTokeniser library.
As some may recall, jTokeniser comprises of 6 tokenisers ranging from
basic to powerful, and they were accessible in a very simple Java API.
Tokenisers include:
* WhiteSpaceTokeniser
* StringTokeniser (based on specified delimiters)
* RegexTokeniser (regular expression defines a token)
* RegexSeparatorTokeniser (define what is *not* a token)
* BreatIteratorTokeniser (sophisticated locale-specific tokeniser)
* SentenceTokeniser (sentence segmentation)
jTokeniser v2.0 makes no changes to the core tokenisers themselves, but
adds a nice GUI front-end to the library to allow users to experiment
with the tokenisers interactively.
This should appeal to those who perhaps don't have the programming
experience in Java to utilise the library in its intended form. It also
makes it ideal for use within a learning context, such as an NLP course.
For all information about downloading, installing, running and using
jTokeniser v2.0, please visit the project website (screenshots included):
http://www.andy-roberts.net/software/jTokeniser
Any comments or feature suggestions are welcome.
Regards,
Andy Roberts
More information about the Corpora
mailing list