[Corpora-List] Sentence boundary detection

Mehmet Kayaalp Mehmet.Kayaalp at nih.gov
Tue Jul 24 16:15:40 UTC 2007


Last year, we examined 13 open source, freeware software packages, which can
perform NL tokenization (many of which perform sentence boundary detection
and more) and summarized our experience in a technical report, which is
accessible at http://lhncbc.nlm.nih.gov/lhc/docs/reports/2006/tr2006003.pdf.
Best,

--mehmet 

Mehmet Kayaalp
Lister Hill National Center for Biomedical Communications
Building 38A
National Institutes of Health
8600 Rockville Pike MSC-3828
Bethesda, MD 20894

(301) 451-4633, Fax: (301) 402-0118
Mehmet.Kayaalp at nih.gov


-----Original Message-----
From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of
Kelly Vincent
Sent: Friday, July 20, 2007 10:11 AM
To: corpora at uib.no
Subject: [Corpora-List] Sentence boundary detection

I am interested in what the current state-of-the-art is in sentence boundary

detection and (to a lesser degree) tokenization. I have been able to locate 
several articles, but very few that are quite recent. I would appreciate any

pointers to particularly important papers or to available tools, as well as 
the community's thoughts on the topic.

We are building a Spanish corpus so I am particularly interested in these 
topics from the Spanish perspective, though not confined to that.

Regards,
Kelly Vincent
Software Engineer
MetaMetrics, Inc.

_________________________________________________________________
Local listings, incredible imagery, and driving directions - all in one 
place! http://maps.live.com/?wip=69&FORM=MGAC01


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list