[Corpora-List] sentence boundary detectors
Joel Tetreault
tetreaul at cs.rochester.edu
Mon Feb 19 15:00:58 UTC 2007
hi Armin, if you scroll way down to the "Tools" section of my website, and
then scroll down to the "Sentence Splitters" subsection, you should find a
links to several splitters.
http://www.cs.rochester.edu/u/tetreaul/academic.html
(Please excuse the fact I threw all these links up one page :) )
Your question was posed to corpora-list 3 or 4 years ago, so all the links
above (including an updated link to Scott Piao's Java one) are from other
researchers emailing in with their suggestions. I just ran through the
links, and since it has been several years, a bunch are dead. But if you
google the names of the splitter or their authors, you can probably find
their new locations.
I'd also check out the corpora-list archives:
http://listserv.linguistlist.org/cgi-bin/wa?S1=corpora
there might be some emails/links that I missed...
Joel
On Mon, 19 Feb 2007, Scott Songlin Piao wrote:
> Hi Armin,
>
> I put my English sentence splitor on the website:
> http://text0.mib.man.ac.uk:8080/sentencebreaker/heuristic_tool
>
> It is rule-based Java program and is downloadable.
>
> Cheers
>
> Scott Piao
> ----------------------------
> Text Mining
> School of Computer Science
> The University of Manchester
> UK
>
>
>
>
> -----Original Message-----
> From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no] On Behalf Of Armin Schmidt
> Sent: 17 February 2007 19:48
> To: corpora at uib.no
> Subject: [Corpora-List] sentence boundary detectors
>
> Dear list,
>
> I was wondering if you could point me to good sentence splitters for the
> following languages: German, Russian, Spanish, English. It would be
> great if they were stand-alone programs or modules for Python (Perl
> would be ok, too ... although I'm already aware of the respective
> CPAN-modules for English and German).
>
> Since I do have corpora in all the above mentioned languages I would
> also be very interested in available implementations (not papers) of any
> unsupervised learning methods for detecting sentence boundaries (or
> rather abbreviations).
>
> Thanks,
> Armin
>
>
>
>
>
>
More information about the Corpora
mailing list