[Corpora-List] Chinese sentence detector or splitter

Xin Ying Qiu xinying.qiu at gmail.com
Mon Apr 22 12:22:09 UTC 2013


Thank you all for your most instructive comments and suggestions!

>>From a research point of view, Huang/Cheng (2011) is closest to the problem
I'm dealing with.  Kiss/Strunk (2006) is great in that the system works for
eleven languages and for different text genres.  Though Chinese is not
tested by the system, one could learn a lot from their methodologies.

For my current task, I may ask the Stanford NLP users list for more
advice.  It could be that I have not searched the right archives or found
the right tools.

Thanks again!

Xin Ying


On Sun, Apr 21, 2013 at 11:42 PM, Craig Pfeifer <craig.pfeifer at gmail.com>wrote:

> The latest version of Stanford CoreNLP will process chinese text and
> contains a sentence splitter.
>
> If you have issues with the stanford tools you can send mail to the users
> list : java-nlp-user at lists.stanford.edu
>
> ______________
> craig.pfeifer at gmail.com
>
>
> On Sun, Apr 21, 2013 at 4:34 AM, Xin Ying Qiu <xinying.qiu at gmail.com>wrote:
>
>> Hello,
>>
>> I am processing Chinese reports which include phrases as title and
>> subtitles as well as sentences ending with the period sign.  I want to
>> extract the sentences ending with the period sign. But it is difficult to
>> identify the beginning of such sentences as the document may contain
>> stand-alone phrases and numbers.  It is not a document consisting of only
>> sentences ending with period signs.  Are there any tools available to
>> detect or split or extract Chinese sentence from a document?
>>
>> I've tried Stanford NLP document preprocess tool:
>> edu.stanford.nlp.process.DocumentPreprocessor.  But it does not seem to
>> work for my document.
>>
>> Thank you in advance for any advice and suggestions!
>>
>> Sincerely,
>>
>> Xin Ying
>>
>>
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20130422/10b37c92/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list