[Corpora-List] sentence boundary detectors

Armin Schmidt armin.sch at gmail.com
Tue Feb 20 17:20:44 UTC 2007


Joel,

thanks. Unfortunately, many of the links on your page are indeed dead.
But I'll post a summary of all the responses I got so far to the list,
so you can update  your link list, too.

Of course, I searched the archives (and the web) before posting to
corpora list but the responses to those earlier posts were of limited
use only for my particular task. Also, I wanted to find out if, in the
meantime, sentence splitters had been developed which could be trained
on particular corpora in an language-independent manner (more on this in
my summary).

Cheers,
Armin

Joel Tetreault schrieb:
> 
> hi Armin, if you scroll way down to the "Tools" section of my website,
> and then scroll down to the "Sentence Splitters" subsection, you should
> find a links to several splitters.
> 
> http://www.cs.rochester.edu/u/tetreaul/academic.html
>
> (Please excuse the fact I threw all these links up one page :) )
> 
> Your question was posed to corpora-list 3 or 4 years ago, so all the
> links above (including an updated link to Scott Piao's Java one) are
> from other researchers emailing in with their suggestions.  I just ran
> through the links, and since it has been several years, a bunch are
> dead.  But if you google the names of the splitter or their authors, you
> can probably find their new locations.
> 
> I'd also check out the corpora-list archives:
> 
> http://listserv.linguistlist.org/cgi-bin/wa?S1=corpora
> 
> there might be some emails/links that I missed...
> 
> Joel
> 
> 
> On Mon, 19 Feb 2007, Scott Songlin Piao wrote:
> 
>> Hi Armin,
>>
>> I put my English sentence splitor on the website:
>> http://text0.mib.man.ac.uk:8080/sentencebreaker/heuristic_tool
>>
>> It is rule-based Java program and is downloadable.
>>
>> Cheers
>>
>> Scott Piao
>> ----------------------------
>> Text Mining
>> School of Computer Science
>> The University of Manchester
>> UK
>>
>>
>>
>>
>> -----Original Message-----
>> From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no]
>> On Behalf Of Armin Schmidt
>> Sent: 17 February 2007 19:48
>> To: corpora at uib.no
>> Subject: [Corpora-List] sentence boundary detectors
>>
>> Dear list,
>>
>> I was wondering if you could point me to good sentence splitters for the
>> following languages: German, Russian, Spanish, English. It would be
>> great if they were stand-alone programs or modules for Python (Perl
>> would be ok, too ... although I'm already aware of the respective
>> CPAN-modules for English and German).
>>
>> Since I do have corpora in all the above mentioned languages I would
>> also be very interested in available implementations (not papers) of any
>> unsupervised learning methods for detecting sentence boundaries (or
>> rather abbreviations).
>>
>> Thanks,
>> Armin
>>
>>
>>
>>
>>
>>
> 

-- 
http://diotavelli.net/people/armin/



More information about the Corpora mailing list