<div>All the sentence segmentation</div><div>tools that I am aware (for example David Palmer's SATZ) of tag sentence boundaries by looking </div><div>at a pretty wide range of features of the text, some of which are really matters of </div>
<div>how newspapers happen to be laid out, </div><div>and wouldn't immediately transfer to use with a spoken corpus. So I think you </div><div>probably are not going to find an off-the-shelf tool.</div><div><br class="webkit-block-placeholder">
</div><div>In practice, the best next step is to find a friend who is good with Python, Perl, Ruby or another</div><div>good text processing tool that handles regular expressions. Force your friend to sit down with you</div>
<div>and take a very detailed look at precisely what the corpus transcription you are working with is</div><div>like, then devise a regular expression that catches most of the boundaries you want. The result</div><div>will probably be highly tied to the specifics of your corpus, and will probably not be perfect, but</div>
<div>it will be a start.</div><div><div><br><div><span class="gmail_quote">On 21/02/2008, <b class="gmail_sendername">Su Qi Apple</b> <<a href="mailto:applesuqi@yahoo.co.uk" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">applesuqi@yahoo.co.uk</a>> wrote:</span><blockquote class="gmail_quote" style="margin:0;margin-left:0.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div><div style="font-family:times new roman, new york, times, serif;font-size:12pt"><div>Dear All</div>
<div> </div>
<div>I am just beginning my study in corpus linguistics and in a corpus of spoken English in particular. I want to ask if someone can tell me if you know of any tagging programs that can indicate C-units as opposed to sentences.</div>
<div> </div>
<div>I look forward to your replies.</div>
<div> </div><span>
<div>Apple Su Qi</div></span></div><br><span>
<hr size="1"> Sent from <a href="http://us.rd.yahoo.com/mailuk/taglines/isp/control/*http://us.rd.yahoo.com/evt=51949/*http://uk.docs.yahoo.com/mail/winter07.html" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">Yahoo!</a> - a smarter inbox.</span></div>
<br>_______________________________________________<br>
Corpora mailing list<br>
<a href="mailto:Corpora@uib.no" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">Corpora@uib.no</a><br>
<a href="http://mailman.uib.no/listinfo/corpora" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">http://mailman.uib.no/listinfo/corpora</a><br>
<br></blockquote></div><br> </div></div>