[Corpora-List] Chinese sentence detector or splitter

Sun Apr 21 08:34:45 UTC 2013

Hello,

I am processing Chinese reports which include phrases as title and
subtitles as well as sentences ending with the period sign.  I want to
extract the sentences ending with the period sign. But it is difficult to
identify the beginning of such sentences as the document may contain
stand-alone phrases and numbers.  It is not a document consisting of only
sentences ending with period signs.  Are there any tools available to
detect or split or extract Chinese sentence from a document?

I've tried Stanford NLP document preprocess tool:
edu.stanford.nlp.process.DocumentPreprocessor.  But it does not seem to
work for my document.

Thank you in advance for any advice and suggestions!

Sincerely,

Xin Ying
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20130421/a7fc0ee1/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora