[Corpora-List] Sentence segmenting

Aleksandar Savkov cytehuop at gmail.com
Mon Aug 13 13:52:03 UTC 2012


Hi Jeff,

this is a good paper on sentence segmentation -- the best I've come across.
You might want to check out the related work there too.

http://aclweb.org/anthology-new/N/N09/N09-2061.pdf

Best,
Sasho

On 13 August 2012 14:35, Jeff Elmore <jelmore at lexile.com> wrote:

> I'm curious what folks are using these days for sentence segmenting for
> English.
>
> My application involves narrative and informational texts at a variety of
> reading levels and genres. Most text is hand-edited to eliminate non-prose
> content but any system that could respond robustly to unedited text would
> be awesome, of course.
>
> Mostly we've been using hand-crafted tools written in Python. I have
> checked out what NLTK offers but from what I've seen there's not anything
> terribly accurate in it (fails on obvious common cases like some
> honorifics). We did develop a decision tree based model using Weka for
> Spanish text. I'd be happy to do this again for English but wanted to see
> if there's something good already out there.
>
> Thanks in advance!
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20120813/494c78af/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list