<span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: 13px; background-color: rgb(255, 255, 255); ">Hi Yorick, all:<br><br>Greetings from NUS. My name is Tao Chen, a second year Ph.D. student<br>
working on SMS corpus collection. Our old 2004 NUS SMS Corpus (found<br>at <a href="http://www.comp.nus.edu.sg/~rpnlpir/downloads/corpora/smsCorpus/" target="_blank" style="color: rgb(51, 51, 204); ">http://www.comp.nus.edu.sg/~rpnlpir/downloads/corpora/smsCorpus/</a>)<br>
has already been mentioned by Yunqing in his reply (Thanks to Yunqing<br>for the pointer!)<br><br>But I'd like to point out that our group (in specific, me) has<br>resurrected the SMS collection project as of October 2010, reviving it<br>
as a live corpus project for gathering multilingual SMS, currently<br>focusing on English and Mandarin Chinese SMS. Up to now, we have<br>collected 28,724 English SMS and 28,869 Chinese SMS and have been<br>releasing a new version of the corpus and its summary statistics for<br>
it on a regular monthly release schedule.<br><div class="im" style="color: rgb(80, 0, 80); "><br><a href="http://wing.comp.nus.edu.sg/SMSCorpus/" target="_blank" style="color: rgb(51, 51, 204); ">http://wing.comp.nus.edu.sg/SMSCorpus/</a><br>
<br></div>Importantly, this corpus is freely available for any use, including<br>commercial and research purposes and is in the public domain. The<br>latest version of the corpus is downloadable from our schools Research<br>
to Market portal (which requires registration just for record keeping<br>purposes). Past versions (<1+ month old) are freely available as<br>simple download links on the corpus webpage.<br><br>The corpus was collected under NUS IRB exemption policy (#10-481) and<br>
important identifiers in the corpus have been replaced by placeholder<br>tokens for deidentification purposes.<br><br>You may also be interested in our draft article in preparation about<br>the corpus creation, which also contains a comprehensive literature<br>
review of existing SMS corpora as well. If you (or others) are<br>interested in the details, we are most happy to share the draft to<br>you.<br><br>We'd like to encourage you and any others interested, to contribute to<br>
the corpus. We have experimented with a number of collection methods<br>in our study and are documenting it in the draft article. Finally, if<br>you have any suggestions in improving the corpus collection process<br>for SMS or how it might be changed to better serve your research,<br>
please do get in touch with us. We really want to know how to make<br>this corpus more useful to SMS studies of all different natures.<br><br>Sincerely,<br><br>Tao CHEN<br>on behalf of the Web IR / NLP Group (WING) at NUS<br>
<a href="http://www.comp.nus.edu.sg/~taochen/" target="_blank" style="color: rgb(51, 51, 204); ">http://www.comp.nus.edu.sg/~taochen/</a></span><br clear="all"><div><br></div>