[Corpora-List] Tweet Normalization Workshop at SEPLN 2013

L Alfonso laurena at ujaen.es
Mon May 20 11:31:19 UTC 2013


[apologies for multiple postings]

 

============================================================================
=========

TWEET-NORM 2013

Tweet Normalization Workshop at SEPLN 2013

Madrid, Spain

 

15-20 September, 2013

 

 <http://komunitatea.elhuyar.org/tweet-norm/>
http://komunitatea.elhuyar.org/tweet-norm/

 

 

 

============================================================================
=========

Call for papers

============================================================================
=========

 

TWEET-NORM 2013, that will be held in the 29th edition of the Annual
Conference 

of the Spanish Society for Natural Language Processing (SEPLN2013) in Madrid
(Spain), invites researchers to submit articles

or unpublished recent studies relating to systems, methods and algorithms
for lexical normalization 

of tweets in Spanish and, specially, to participate in the proposed shared
task.

 

 

Introduction

------------

 

One of the most important challenges facing us today is how to process and
analyze the large amount

of information on the Internet, and especially social networking sites like
Twitter, where millions of people

daily express ideas and opinions on any topic of interest. These texts,
called tweets, are

characterized by having a short length (140 characters) that is too small
compared with the size of traditional genres.

Consequently, users of these networks have developed a new form of
expression that

includes SMS-style abbreviations, lexical variants, letters repetitions, use
of emoticons, etc.

The result is that current NLP tools can have problems to process and
understand these short and noisy texts unless they are normalized first.

 

The TWEET-NORM lexical normalization task proposes the automatic "cleansing"
of a set amount of

tweets by identifying and normalizing, abbreviations, words with repeated
letters, and generally

any out of the vocabulary (OOV) words, regardless of syntactic or stylistic
variants.

 

While there has been some progress in this field for English tweets there
are very few

studies and resources available to date for Spanish. Thus, the aim of

the workshop is to provide a forum for discussion and communication where
researchers can

test approaches, algorithms and resources in order to promote the
application of techniques and algorithms 

in this area. To do this, a shared task in which the participants will have
to normalize a set of tweets, is proposed.

An annotated corpus will be provided to the participants in order to develop
and test the proposed solutions.

 

 

Corpus

------

 

The corpus is composed by tweets gathered between the 1st and 2nd of April
2013 covering the geographic area of the Iberian peninsula, 

but ignoring those regions that have co-official languages. A large portion
of these messages contain serious normalization problems.

 

>>From this initial corpus two subsets are generated: a development set
consisting of 500 tweets, and a test set consisting of 2000 tweets.

Corpus will be available in the web page of the workshop at
<http://komunitatea.elhuyar.org/tweet-norm/resources/>
http://komunitatea.elhuyar.org/tweet-norm/resources/

 

 

Registration

------------

 

Participants are required to register for the task in order to obtain de
corpus by sending an email before May 31 to  <mailto:tweet-norm at elhuyar.com>
tweet-norm at elhuyar.com

 

 

Submitting articles

------------------------

 

Submitted papers will have a maximum length of 4 pages, should follow the

format established by the SEPLN (
<http://nil.fdi.ucm.es/sepln2013/callen.html>
http://nil.fdi.ucm.es/sepln2013/callen.html) and will be sent by web.

 

 

 

Important Dates

---------------------------

 

May 30: Registration deadline for participants and publication of the
development set.

 

July 5: Publication of the test set.

 

July 15: Result submission deadline.

 

July 25: Publication of results.

 

July 31: Article submission deadline.

 

September 15: Workshop at SEPLN 2013 in Madrid.

 

 

-----------------------------------------------------------------
L. Alfonso Ureña López

Departamento de Informática
Escuela Politécnica Superior (A3-129)
Universidad de Jaén

Campus Las Lagunillas. Phone: +34 953 21 28 95
23071 - Jaén- Spain        Fax:     +34 953 21 24 72

 <http://wwwdi.ujaen.es/~laurena> http://wwwdi.ujaen.es/~laurena
SEPLN ( <http://www.sepln.org> http://www.sepln.org)
-----------------------------------------------------------------

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20130520/58580ae4/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list