[Corpora-List] Sentence boundary detection

Katrin Tomanek tomanek at coling-uni-jena.de
Tue Jul 24 15:22:16 UTC 2007


Hi,

we have a ML-based sentence splitter/tokenizer. Both are little bit 
optimized for the bio-medical domain (english), but are of course (given 
you have the training material) applicable to other domains.

Both tools are available in a command-line mode and as UIMA components. 
They can be downloaded from our website: http://julielab.de. You will 
find a reference to our paper on these tools (MEDINFO 2007) on the 
website as well.

Regards,
Katrin

-- 
Katrin Tomanek
Jena University Language and Information Engineering (JULIE) Lab
Phone: +49-3641-944307
Fax:   +49-3641-944321
email: tomanek at coling-uni-jena.de
URL:   http://www.julielab.de


Kelly Vincent wrote:
> I am interested in what the current state-of-the-art is in sentence boundary 
> detection and (to a lesser degree) tokenization. I have been able to locate 
> several articles, but very few that are quite recent. I would appreciate any 
> pointers to particularly important papers or to available tools, as well as 
> the community's thoughts on the topic.
> 
> We are building a Spanish corpus so I am particularly interested in these 
> topics from the Spanish perspective, though not confined to that.
> 
> Regards,
> Kelly Vincent
> Software Engineer
> MetaMetrics, Inc.
> 
> _________________________________________________________________
> Local listings, incredible imagery, and driving directions - all in one 
> place! http://maps.live.com/?wip=69&FORM=MGAC01
> 
> 
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list