[Corpora-List] Sentence boundary detection

Andy Roberts andyr at comp.leeds.ac.uk
Tue Jul 24 14:34:05 UTC 2007


Hi,

It's not been under any manjor evaluation by myself, but my jTokeniser
Java library has a sentence segmentation module. I'm utilising Java's
built-in text processing libraries (which were donated by IBM's ICU4J
project) to do all the hard work.

See http://www.andy-roberts.net/software/jTokeniser/

There's also a GUI available for you to test the various tokenisers
interactively.

Regards,
Andy

On Fri, 20 Jul 2007, Kelly Vincent wrote:

> I am interested in what the current state-of-the-art is in sentence boundary
> detection and (to a lesser degree) tokenization. I have been able to locate
> several articles, but very few that are quite recent. I would appreciate any
> pointers to particularly important papers or to available tools, as well as
> the community's thoughts on the topic.
>
> We are building a Spanish corpus so I am particularly interested in these
> topics from the Spanish perspective, though not confined to that.
>
> Regards,
> Kelly Vincent
> Software Engineer
> MetaMetrics, Inc.
>
> _________________________________________________________________
> Local listings, incredible imagery, and driving directions - all in one
> place! http://maps.live.com/?wip=69&FORM=MGAC01
>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list