[Corpora-List] SMS corpus
Cédrick Fairon
cedrick.fairon at uclouvain.be
Fri Sep 1 14:06:29 UTC 2006
Dear Alexander,
The Centre for natural language processing at the University of
Louvain (http://cental.fltr.ucl.ac.be) has collected a corpus of
75.000 French sms (more than 2400 authors, aged 12 to 65). Details
about the project are available online: http://www.smspourlascience.be
A subset of this corpus (30.000 SMS) has been released and published
on a CD-ROM at the Louvain University Press and is available from
http://www.i6doc.com/doc/sms (licence for non-profit organisations
only, others may contact us).
Two interesting remarks about the corpus:
- it contains information about the authors'profile (sex, age,
occupation, mother tongue, second language, place of living, etc.).
These profiles are linked to the messages, so that you can select a
subset of the corpus corresponding to given sociolinguistic details;
- each message was linked to a "transcribed" version in "standard"
French so that you can search for a word and get all the variants
present in the corpus.
All the info in C. Fairon, S. Paumier (2006). "A translated corpus of
30,000 French SMS". In Proceedings of LREC 2006. Genova.
Best Regards,
Cédrick
Le 01-sept.-06 à 15:00, Alexander Osherenko a écrit :
> Hello,
>
> has anybody heard of a text corpus with SMS messages? Actually it
> should be emotional, but at first it doesn't matter much.
>
> Best
>
> Alexander
>
Cédrick Fairon
cedrick.fairon at uclouvain.be
Directeur du CENTAL
Centre de traitement automatique du langage
Université catholique de Louvain
Place Blaise Pascal, 1
1348 Louvain-la-Neuve
Belgique
tel: +32 10 47 37 88
fax: +32 10 47 26 06
http://cental.fltr.ucl.ac.be
http://glossa.fltr.ucl.ac.be
More information about the Corpora
mailing list