[Corpora-List] sentence boundary detectors

Joel Tetreault tetreaul at cs.rochester.edu
Mon Feb 19 15:00:58 UTC 2007


hi Armin, if you scroll way down to the "Tools" section of my website, and 
then scroll down to the "Sentence Splitters" subsection, you should find a 
links to several splitters.

http://www.cs.rochester.edu/u/tetreaul/academic.html

(Please excuse the fact I threw all these links up one page :) )

Your question was posed to corpora-list 3 or 4 years ago, so all the links 
above (including an updated link to Scott Piao's Java one) are from other 
researchers emailing in with their suggestions.  I just ran through the 
links, and since it has been several years, a bunch are dead.  But if you 
google the names of the splitter or their authors, you can probably find 
their new locations.

I'd also check out the corpora-list archives:

http://listserv.linguistlist.org/cgi-bin/wa?S1=corpora

there might be some emails/links that I missed...

Joel


On Mon, 19 Feb 2007, Scott Songlin Piao wrote:

> Hi Armin,
>
> I put my English sentence splitor on the website:
> http://text0.mib.man.ac.uk:8080/sentencebreaker/heuristic_tool
>
> It is rule-based Java program and is downloadable.
>
> Cheers
>
> Scott Piao
> ----------------------------
> Text Mining
> School of Computer Science
> The University of Manchester
> UK
>
>
>
>
> -----Original Message-----
> From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no] On Behalf Of Armin Schmidt
> Sent: 17 February 2007 19:48
> To: corpora at uib.no
> Subject: [Corpora-List] sentence boundary detectors
>
> Dear list,
>
> I was wondering if you could point me to good sentence splitters for the
> following languages: German, Russian, Spanish, English. It would be
> great if they were stand-alone programs or modules for Python (Perl
> would be ok, too ... although I'm already aware of the respective
> CPAN-modules for English and German).
>
> Since I do have corpora in all the above mentioned languages I would
> also be very interested in available implementations (not papers) of any
> unsupervised learning methods for detecting sentence boundaries (or
> rather abbreviations).
>
> Thanks,
> Armin
>
>
>
>
>
>



More information about the Corpora mailing list