[Corpora-List] Call For Participation - DSL Shared Task at VarDial Workshop - COLING 2014 in Dublin, Ireland.
Marcos Zampieri
marcos.zampieri at uni-koeln.de
Sat Mar 1 15:17:39 UTC 2014
Call For Participation
DSL Shared Task at VarDial Workshop - COLING 2014 in Dublin, Ireland.
DSL Shared Task: http://corporavm.uni-koeln.de/vardial/sharedtask.html
VarDial Workshop: http://corporavm.uni-koeln.de/vardial/
Discriminating between similar languages and language varieties is one
of the bottlenecks of language identification. This aspect has been
topic of a number of papers published in the last years. The DSL
shared task aims to provide a dataset to evaluate system's performance
on discriminating 13 different languages in 6 language groups.
We invite researchers and developers to participate. To receive the
training data, please register before March 20th at:
http://goo.gl/A3Dd49
The best systems will be invited to submit a short paper to appear in
the VarDial workshop proceedings.
Data
We will first provide a set of 20,000 instances per language (18,000
training + 2,000 development) in CSV format. Each instance is a full
sentence extracted from journalistic corpora and written in one of the
languages and tagged with the language group and country of origin.
After one month we will release a test set containing 1,000
unidentified instances of each language to be classified according to
the country of origin.
Group A (Bosnian, Croatian, Serbian)
Group B (Brazilian Portuguese, European Portuguese)
Group C (Indonesian, Malaysian)
Group D (Czech, Slovakian)
Group E (Peninsular Spain, Argentine Spanish)
Group F (American English, British English)
We allow two kinds of submissions (please indicate this when you fill
your registration form):
Closed submission: Using only the training corpus provided by the DSL
shared task.
Open submission: Using any corpus for training including the DSL one.
Important Dates
Training set release: March 20th, 2014
Test set release: April 21st, 2014
Submissions due: April 23rd, 2014 (23:59 EST)
Results announced: April 30th, 2014
Short papers deadline: May 30th, 2014
Feedback: June 20th, 2014
Camera-ready versions: June 27th, 2014
Organizers
Marcos Zampieri (Saarland University, Germany)
Liling Tan (Saarland University, Germany)
Nikola Ljubešić (University of Zagreb, Croatia)
Jörg Tiedemann (Uppsala University, Sweden)
Contact
Shared Task: dsl.sharedtask at gmail.com
Workshop: vardialworkshop at gmail.com
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list