[Corpora-List] Tweet Normalization Workshop at SEPLN 2013
L Alfonso
laurena at ujaen.es
Mon May 20 11:31:19 UTC 2013
[apologies for multiple postings]
============================================================================
=========
TWEET-NORM 2013
Tweet Normalization Workshop at SEPLN 2013
Madrid, Spain
15-20 September, 2013
<http://komunitatea.elhuyar.org/tweet-norm/>
http://komunitatea.elhuyar.org/tweet-norm/
============================================================================
=========
Call for papers
============================================================================
=========
TWEET-NORM 2013, that will be held in the 29th edition of the Annual
Conference
of the Spanish Society for Natural Language Processing (SEPLN2013) in Madrid
(Spain), invites researchers to submit articles
or unpublished recent studies relating to systems, methods and algorithms
for lexical normalization
of tweets in Spanish and, specially, to participate in the proposed shared
task.
Introduction
------------
One of the most important challenges facing us today is how to process and
analyze the large amount
of information on the Internet, and especially social networking sites like
Twitter, where millions of people
daily express ideas and opinions on any topic of interest. These texts,
called tweets, are
characterized by having a short length (140 characters) that is too small
compared with the size of traditional genres.
Consequently, users of these networks have developed a new form of
expression that
includes SMS-style abbreviations, lexical variants, letters repetitions, use
of emoticons, etc.
The result is that current NLP tools can have problems to process and
understand these short and noisy texts unless they are normalized first.
The TWEET-NORM lexical normalization task proposes the automatic "cleansing"
of a set amount of
tweets by identifying and normalizing, abbreviations, words with repeated
letters, and generally
any out of the vocabulary (OOV) words, regardless of syntactic or stylistic
variants.
While there has been some progress in this field for English tweets there
are very few
studies and resources available to date for Spanish. Thus, the aim of
the workshop is to provide a forum for discussion and communication where
researchers can
test approaches, algorithms and resources in order to promote the
application of techniques and algorithms
in this area. To do this, a shared task in which the participants will have
to normalize a set of tweets, is proposed.
An annotated corpus will be provided to the participants in order to develop
and test the proposed solutions.
Corpus
------
The corpus is composed by tweets gathered between the 1st and 2nd of April
2013 covering the geographic area of the Iberian peninsula,
but ignoring those regions that have co-official languages. A large portion
of these messages contain serious normalization problems.
>>From this initial corpus two subsets are generated: a development set
consisting of 500 tweets, and a test set consisting of 2000 tweets.
Corpus will be available in the web page of the workshop at
<http://komunitatea.elhuyar.org/tweet-norm/resources/>
http://komunitatea.elhuyar.org/tweet-norm/resources/
Registration
------------
Participants are required to register for the task in order to obtain de
corpus by sending an email before May 31 to <mailto:tweet-norm at elhuyar.com>
tweet-norm at elhuyar.com
Submitting articles
------------------------
Submitted papers will have a maximum length of 4 pages, should follow the
format established by the SEPLN (
<http://nil.fdi.ucm.es/sepln2013/callen.html>
http://nil.fdi.ucm.es/sepln2013/callen.html) and will be sent by web.
Important Dates
---------------------------
May 30: Registration deadline for participants and publication of the
development set.
July 5: Publication of the test set.
July 15: Result submission deadline.
July 25: Publication of results.
July 31: Article submission deadline.
September 15: Workshop at SEPLN 2013 in Madrid.
-----------------------------------------------------------------
L. Alfonso Ureña López
Departamento de Informática
Escuela Politécnica Superior (A3-129)
Universidad de Jaén
Campus Las Lagunillas. Phone: +34 953 21 28 95
23071 - Jaén- Spain Fax: +34 953 21 24 72
<http://wwwdi.ujaen.es/~laurena> http://wwwdi.ujaen.es/~laurena
SEPLN ( <http://www.sepln.org> http://www.sepln.org)
-----------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20130520/58580ae4/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list