[Corpora-List] Sentence segmenting

Mon Aug 13 20:20:01 UTC 2012

Hi Jeff,

two years ago there was an exhaustive summary of a similar request:
http://mailman.uib.no/public/corpora/2010-August/011367.html

But check the list archives (or Google) for
"sentence (splitt(er|ing)|boundar(y|ies)|detector)" or similar.
There have been a couple of threads during the last years.

Regards,
Sebastian

On 08/13/2012 03:35 PM, Jeff Elmore wrote:
> I'm curious what folks are using these days for sentence segmenting for
> English.
> 
> My application involves narrative and informational texts at a variety of
> reading levels and genres. Most text is hand-edited to eliminate non-prose
> content but any system that could respond robustly to unedited text would
> be awesome, of course.
> 
> Mostly we've been using hand-crafted tools written in Python. I have
> checked out what NLTK offers but from what I've seen there's not anything
> terribly accurate in it (fails on obvious common cases like some
> honorifics). We did develop a decision tree based model using Weka for
> Spanish text. I'd be happy to do this again for English but wanted to see
> if there's something good already out there.
> 
> Thanks in advance!
> 
> 
> 
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
> 

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora