[Corpora-List] A corpus of text messages

Tao Chen taochen at comp.nus.edu.sg
Wed Apr 18 08:36:59 UTC 2012


Hi Thapelo, all:

Greeting from NUS. My name is Tao Chen, a second year Ph.D. student
working on SMS corpus collection.

Currently we have collected 41,317 English SMS and 29, 533 Chinese SMS,
and have released the corpus and its summary statistics on our corpus
website.

http://wing.comp.nus.edu.sg/SMSCorpus/

Also, we have written a technical report about our efforts of data
collection, as well
as a comprehensive literature review on the existing SMS corpora. You could
check
out the paper at http://arxiv.org/abs/1112.2468.  (Thanks to Nancy for the
pointer!)

Our corpus is still a live project.  As such, we encourage you and
community members
interested, to contribute to the corpus. Please go to our corpus website
for more information
about the contribution.

Sincerely,

Tao Chen
on behalf of the Web IR / NLP Group (WING) at NUS
http://www.comp.nus.edu.sg/~taochen/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20120418/1449075d/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list