[Corpora-List] Sentence boundary detection
Katrin Tomanek
tomanek at coling-uni-jena.de
Tue Jul 24 15:22:16 UTC 2007
Hi,
we have a ML-based sentence splitter/tokenizer. Both are little bit
optimized for the bio-medical domain (english), but are of course (given
you have the training material) applicable to other domains.
Both tools are available in a command-line mode and as UIMA components.
They can be downloaded from our website: http://julielab.de. You will
find a reference to our paper on these tools (MEDINFO 2007) on the
website as well.
Regards,
Katrin
--
Katrin Tomanek
Jena University Language and Information Engineering (JULIE) Lab
Phone: +49-3641-944307
Fax: +49-3641-944321
email: tomanek at coling-uni-jena.de
URL: http://www.julielab.de
Kelly Vincent wrote:
> I am interested in what the current state-of-the-art is in sentence boundary
> detection and (to a lesser degree) tokenization. I have been able to locate
> several articles, but very few that are quite recent. I would appreciate any
> pointers to particularly important papers or to available tools, as well as
> the community's thoughts on the topic.
>
> We are building a Spanish corpus so I am particularly interested in these
> topics from the Spanish perspective, though not confined to that.
>
> Regards,
> Kelly Vincent
> Software Engineer
> MetaMetrics, Inc.
>
> _________________________________________________________________
> Local listings, incredible imagery, and driving directions - all in one
> place! http://maps.live.com/?wip=69&FORM=MGAC01
>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list