[Corpora-List] Call For Participation - DSL Shared Task at VarDial Workshop - COLING 2014 in Dublin, Ireland.

Marcos Zampieri marcos.zampieri at uni-koeln.de
Sat Mar 1 15:17:39 UTC 2014

Call For Participation

DSL Shared Task at VarDial Workshop - COLING 2014 in Dublin, Ireland.

DSL Shared Task: http://corporavm.uni-koeln.de/vardial/sharedtask.html
VarDial Workshop: http://corporavm.uni-koeln.de/vardial/

Discriminating between similar languages and language varieties is one  
of the bottlenecks of language identification. This aspect has been  
topic of a number of papers published in the last years. The DSL  
shared task aims to provide a dataset to evaluate system's performance  
on discriminating 13 different languages in 6 language groups.

We invite researchers and developers to participate. To receive the  
training data, please register before March 20th at:  

The best systems will be invited to submit a short paper to appear in  
the VarDial workshop proceedings.


We will first provide a set of 20,000 instances per language (18,000  
training + 2,000 development) in CSV format. Each instance is a full  
sentence extracted from journalistic corpora and written in one of the  
languages and tagged with the language group and country of origin.  
After one month we will release a test set containing 1,000  
unidentified instances of each language to be classified according to  
the country of origin.

Group A (Bosnian, Croatian, Serbian)
Group B (Brazilian Portuguese, European Portuguese)
Group C (Indonesian, Malaysian)
Group D (Czech, Slovakian)
Group E (Peninsular Spain, Argentine Spanish)
Group F (American English, British English)

We allow two kinds of submissions (please indicate this when you fill  
your registration form):

Closed submission: Using only the training corpus provided by the DSL  
shared task.
Open submission: Using any corpus for training including the DSL one.

Important Dates

Training set release: March 20th, 2014
Test set release: April 21st, 2014
Submissions due: April 23rd, 2014 (23:59 EST)
Results announced: April 30th, 2014
Short papers deadline: May 30th, 2014
Feedback: June 20th, 2014
Camera-ready versions: June 27th, 2014


Marcos Zampieri (Saarland University, Germany)
Liling Tan (Saarland University, Germany)
Nikola Ljubešić (University of Zagreb, Croatia)
Jörg Tiedemann (Uppsala University, Sweden)


Shared Task: dsl.sharedtask at gmail.com
Workshop: vardialworkshop at gmail.com

UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no

More information about the Corpora mailing list