[Corpora-List] SMS corpus

Min-Yen Kan knmnyn at gmail.com
Fri Sep 1 15:06:11 UTC 2006


Hi all:

I think Emmanuel Prochasson already mentioned the corpus that we have
collected at NUS.  It is a medium sized corpus with about 10K messages
sent by students in Singapore.  We are still in the process of
enlarging the corpus, but also would like to hear what corpus
researchers are looking to find with such corpora.  For example, would
a collection of more messages from a few individuals be of more use
than a collection with few messages from a wider variety of
contributors?

Most of the messages that we have collected are self-selected by
university students to be made public in the corpus, so there's we
believe that there is likely a bias towards messages that are less
personal than what actually occurs in real life.  So you may have less
luck finding emotional messages in our corpus.

Have you thought of supplementing your corpus studies with chat
language?  My past student was looking at some chat logs from
commercial sites to supplement his studies and corpus collection.

The SMS corpus is here (as stated by Emmanuel)

http://www.comp.nus.edu.sg/~rpnlpir/downloads/corpora/smsCorpus/

Min-Yen Kan
Assistant Professor
Web / IR / NLP Group (WING), School of Computing
National University of Singapore

On 9/1/06, Alexander Osherenko <osherenko at gmx.de> wrote:
> Hello,
>
> has anybody heard of a text corpus with SMS messages? Actually it should
> be emotional, but at first it doesn't matter much.
>
> Best
>
> Alexander
>
>



More information about the Corpora mailing list