[Corpora-List] Corpus for microtext normalization released
Iñaki San Vicente Roncal
i.sanvicente at elhuyar.com
Tue Nov 12 16:56:06 UTC 2013
Dear Colleagues,
We are Happy to announce the release of the Tweet-Norm_es corpus,
built for the Tweet-Norm tweet normalization
workshop<http://komunitatea.elhuyar.org/tweet-norm>.
Tweet-Norm_es is a corpus of tweets annotated for microtext normalization.
It contains a set of tweets in Spanish where Out Of Vocabulary (OOV) words
are classified, and the normalized forms are provided. The corpus is
released under the Creative Commons License (CC BY).
The corpus is available for download in the following link:
http://komunitatea.elhuyar.org/tweet-norm/files/2013/11/tweet-norm_es.zip
If you use this corpus, please cite the following paper:
- Iñaki Alegria, Nora Aranberri, Víctor Fresno, Pablo Gamallo, Lluis
Padró, Iñaki San Vicente, Jordi Turmo, Arkaitz Zubiaga. 2013. "Introducción
a la tarea compartida Tweet-Norm 2013: Normalización léxica de tuits en
español". Workshop on Tweet Normalization at SEPLN (Tweet-Norm). Madrid.
pp. 36-45. ISBN: 978-84-695-8349-4
For any further questions or suggestions don't hesitate to contact us
at tweet-norm at elhuyar.com
Regards,
Tweet-Norm organizers.
--
*Iñaki San Vicente Roncal*
I+G IKERTZAILEA / R&D RESEARCHER
i.sanvicente at elhuyar.com | <i.sanvicente at elhuyar.com><i.sanvicente at elhuyar.com>
inaki.sanvicente at ehu.es |
<http://scholar.google.es/citations?user=eb_xVO4AAAAJ&hl=en>
<https://www.researchgate.net/profile/Inaki_San_Vicente/>
tel. Elhuyar: 943363040 | luzp.: 225
tel. Ixa: 943015110 | 318 bulegoa
Zelai Haundi, 3. Osinalde industrialdea
20170 Usurbil
*www.elhuyar.org* <http://www.elhuyar.org>* | **ixa.si.ehu.es
*<http://ixa.si.ehu.es>
Mezu honek, baita erantsitako edozein agirik ere, isilpeko informazioa izan
dezake. Informazio hori jasotzeko baimena izendatutakoak baino ez du. Zu ez
bazara adierazitako hartzailea, indarrean dagoen legeriaren arabera
debekatuta daukazu informazio hori baimenik gabe erabili, hedatu
eta/edo kopiatzea.
Mezu hau errakuntza baten ondorioz jaso baduzu, jakinarazi bidaltzaileari,
eta ezaba ezazu. Eskerrik asko.
Ez inprimatu mezu hau ezinbestekoa ez bada.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20131112/bba6d318/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list