[Corpora-List] SMS Corpus in French
Mathieu Roche
mathieu.roche at lirmm.fr
Fri Jun 27 17:34:51 UTC 2014
Dear colleagues,
We are proud to annonce the release of our large SMS corpus in French, "88milSMS".
User conditions and downloads can be accessed here:
http://88milsms.huma-num.fr/
© Panckhurst R., Détrie C., Lopez C., Moïse C., Roche M., Verine B. (2014) "88milSMS. A corpus of authentic text messages in French", produit par l’Université Paul-Valéry Montpellier 3 et le CNRS, en collaboration avec l’Université catholique de Louvain, financé grâce au soutien de la MSH-M et du Ministère de la Culture (Délégation générale à la langue française et aux langues de France) et avec la participation de Praxiling, Lirmm, Lidilem, Tetis, Viseo.
contact: 88milsms at univ-montp3.fr
------------------------------------------
A pluridisciplinary team of linguists and computer scientists (Rachel Panckhurst, Catherine Détrie, Cédric Lopez, Claudine Moïse, Mathieu Roche, Bertrand Verine (Praxiling, Lirmm, Lidilem, Tetis, Viseo) collected more than 88,000 French authentic text messages in Montpellier (2011), as part of the sud4science LR project (Sud4science Languedoc Roussillon. Mutation des pratiques scripturales en communication électronique médiée (main financial support: MSH-M)). This project is part of a vast international project entitled sms4science, coordinated by the CENTAL at Université catholique de Louvain (UCL) in Belgium. Participants from the general public, who donated their SMS to science, were also able to fill in a sociolinguistic questionnaire. The text messages from the sud4science LR project were then semi-automatically anonymised (in collaboration with student internships and a legal adviser-CIL, Nicolas Hvoinsky, SAJI, Université Paul-Valéry), before being partially transcoded (into standardised French) and annotated (cf. Panckhurst et al. 2013).
------------------------------------------
Kind regards,
Rachel Panckhurst, Catherine Détrie, Cédric Lopez, Claudine Moïse, Mathieu Roche, Bertrand Verine
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20140627/5464c100/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list