<div dir="ltr"><div><div><div>Dear Xin Ying Qiu<br><br></div>This doesn't sound like it would to too hard to write a script for, or just do it in word... Why don't you post an extract from one of your reports, with a few sentences you do want and numbers/headings that you don't want.<br>
<br></div>Seems like you could just do it in word by substituting a ^p for all the 。 ?!<span class=""></span>symbols? <br><br>Sometimes the Chinese period is mid-line, sometimes at the bottom (like English punctuation). I'm not sure how to control this or whether they are different Unicode characters. But that could be why the program you were using didn't find the periods?<br>
<br></div>Simon<br><div><div><div><div><div><div class="gmail_extra"><br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
1. Chinese sentence detector or splitter (Xin Ying Qiu)<br>
<br>
<br>
----------------------------------------------------------------------<br>
<br>
Message: 1<br>
Date: Sun, 21 Apr 2013 16:34:45 +0800<br>
From: Xin Ying Qiu <<a href="mailto:xinying.qiu@gmail.com">xinying.qiu@gmail.com</a>><br>
Subject: [Corpora-List] Chinese sentence detector or splitter<br>
To: <a href="mailto:Corpora@uib.no">Corpora@uib.no</a><br>
<br>
Hello,<br>
<br>
I am processing Chinese reports which include phrases as title and<br>
subtitles as well as sentences ending with the period sign. I want to<br>
extract the sentences ending with the period sign. But it is difficult to<br>
identify the beginning of such sentences as the document may contain<br>
stand-alone phrases and numbers. It is not a document consisting of only<br>
sentences ending with period signs. Are there any tools available to<br>
detect or split or extract Chinese sentence from a document?<br></blockquote></div><br></div></div></div></div></div></div></div>