[Corpora-List] sentence boundary detectors
Victor Kapustin
victor.kapustin at gmail.com
Sun Feb 18 13:59:56 UTC 2007
Armin,
> I was wondering if you could point me to good sentence
> splitters for the
> following languages: German, Russian
For Russian:
http://aot.ru/download/graphan.tar.gz (source in C++, dll is included in
http://aot.ru/download/shortrml.zip).
For most purposes I use a regexp (in javascript, conversion to Perl/Python is
straightforward):
var _DELIMS_OPEN_RAW_ = '(["</' ;
var _DELIMS_OPEN_ = '\\'+_DELIMS_OPEN_RAW_.split('').join('\\') ;
var sentenceSplitter = new RegExp(
'(?:\\.|\\!|\\?)+\\s+(?=['+_DELIMS_OPEN_+']?[А-ЯЁA-Z])' ) ;
--
Victor Kapustin
More information about the Corpora
mailing list