[Corpora-List] Sentence segmenting

Jeff Elmore jelmore at lexile.com
Mon Aug 13 13:35:57 UTC 2012


I'm curious what folks are using these days for sentence segmenting for
English.

My application involves narrative and informational texts at a variety of
reading levels and genres. Most text is hand-edited to eliminate non-prose
content but any system that could respond robustly to unedited text would
be awesome, of course.

Mostly we've been using hand-crafted tools written in Python. I have
checked out what NLTK offers but from what I've seen there's not anything
terribly accurate in it (fails on obvious common cases like some
honorifics). We did develop a decision tree based model using Weka for
Spanish text. I'd be happy to do this again for English but wanted to see
if there's something good already out there.

Thanks in advance!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20120813/09cd03eb/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list