[Corpora-List] Chinese sentence detector or splitter
Xin Ying Qiu
xinying.qiu at gmail.com
Mon Apr 22 12:22:09 UTC 2013
Thank you all for your most instructive comments and suggestions!
>>From a research point of view, Huang/Cheng (2011) is closest to the problem
I'm dealing with. Kiss/Strunk (2006) is great in that the system works for
eleven languages and for different text genres. Though Chinese is not
tested by the system, one could learn a lot from their methodologies.
For my current task, I may ask the Stanford NLP users list for more
advice. It could be that I have not searched the right archives or found
the right tools.
Thanks again!
Xin Ying
On Sun, Apr 21, 2013 at 11:42 PM, Craig Pfeifer <craig.pfeifer at gmail.com>wrote:
> The latest version of Stanford CoreNLP will process chinese text and
> contains a sentence splitter.
>
> If you have issues with the stanford tools you can send mail to the users
> list : java-nlp-user at lists.stanford.edu
>
> ______________
> craig.pfeifer at gmail.com
>
>
> On Sun, Apr 21, 2013 at 4:34 AM, Xin Ying Qiu <xinying.qiu at gmail.com>wrote:
>
>> Hello,
>>
>> I am processing Chinese reports which include phrases as title and
>> subtitles as well as sentences ending with the period sign. I want to
>> extract the sentences ending with the period sign. But it is difficult to
>> identify the beginning of such sentences as the document may contain
>> stand-alone phrases and numbers. It is not a document consisting of only
>> sentences ending with period signs. Are there any tools available to
>> detect or split or extract Chinese sentence from a document?
>>
>> I've tried Stanford NLP document preprocess tool:
>> edu.stanford.nlp.process.DocumentPreprocessor. But it does not seem to
>> work for my document.
>>
>> Thank you in advance for any advice and suggestions!
>>
>> Sincerely,
>>
>> Xin Ying
>>
>>
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20130422/10b37c92/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list