Hi <span style>Thapelo, all:</span><div><span style><br></span></div><div><span style>Greeting from NUS. </span><span style>My name is Tao Chen, a second year Ph.D. student</span></div><span style>working on SMS corpus collection. </span><div>
<span style><br></span></div><div><font color="#222222" face="arial, sans-serif">Currently we have collected 41,317 English SMS and 29, 533 Chinese SMS,</font></div><div><font color="#222222" face="arial, sans-serif">and have released the corpus and its summary statistics on our corpus website.</font></div>
<div><font color="#222222" face="arial, sans-serif"><br></font></div><div><a href="http://wing.comp.nus.edu.sg/SMSCorpus/" target="_blank" style>http://wing.comp.nus.edu.sg/SMSCorpus/</a>
</div><div><br></div><div>Also, we have written a technical report about our efforts of data collection, as well</div><div>as a comprehensive literature review on the existing SMS corpora. You could check</div><div>out the paper at <a href="http://arxiv.org/abs/1112.2468">http://arxiv.org/abs/1112.2468</a>. (Thanks to Nancy for the pointer!)</div>
<div><br></div><div>Our corpus is still a live project. As such, we encourage you and community members</div><div>interested, to contribute to the corpus. Please go to our corpus website for more information </div><div>about the contribution.</div>
<div><br></div><div>Sincerely,</div><div><br></div><div>Tao Chen</div><div><span style>on behalf of the Web IR / NLP Group (WING) at NUS</span><br style><a href="http://www.comp.nus.edu.sg/~taochen/" target="_blank" style>http://www.comp.nus.edu.sg/~taochen/</a>
</div><div><br></div><br>