Appel: Call for System Participation, Shared Task on Automatic Arabic Error Correction (Registration Deadline: July 1st)

Thierry Hamon hamon at LIMSI.FR
Thu Jun 19 19:07:14 UTC 2014

Date: Mon, 16 Jun 2014 06:34:07 -0700
From: Wajdi Zaghouani <wajdiuqam at>
Message-ID: <1402925647.95132.YahooMailNeo at>


Last Call for System Participation
Shared Task on Automatic Arabic Error Correction
In conjunction with EMNLP Workshop on Arabic Natural Language Processing

Apologies for multiple postings
Please distribute to colleagues


Last Call for System Participation

Shared Task on Automatic Arabic Error Correction
collocated with EMNLP 2014, Doha, Qatar

Registration deadline: July 1, 2014
System test period: July 7-18, 2014
Workshop date: Saturday October 25, 2014

Shared Task Website:

Shared Task Description

As part of the Arabic Natural Language Processing Workshop at EMNLP
2014, we will conduct a shared task on Automatic Arabic Error
Correction. We designed this task in the traditions of high profile
shared tasks in natural language processing such as CONLL's
grammar/error detection and correction shared tasks in 2011-2013 and
numerous machine translation campaigns by NIST/WMT/MEDAR, among
others. The task relies on resources created under the Qatar Arabic
Language Bank (QALB) project (currently over 1M words of manually
corrected Arabic text).

A participating system in this shared task will be given Modern Standard
Arabic texts, which are to be automatically corrected. The input will be
provided in Arabic script, and will be annotated for part-of-speech (in
different granularities), inflectional features, clitics (which appear
in 20% of Arabic words), lemmas, and English glosses. All of the input
text will be preprocessed in a common way to make sure all participants
have access to all of these features at no additional overhead novelty
cost. We follow the file format and evaluation framework used by the
CONLL shared tasks on error correction. The task is focused on
correction as opposed to identification. There will not be an error
identification task per se.

Participants need to register. Once registered, all participating teams
will be provided with a common training data set (about 1 million
words), which includes common preprocessed input and corrected
output. Registration link is on the Shared Task Website (see above). A
common development set will also be provided. A blind test data set will
be used to evaluate the output of the participating teams. An evaluation
script will be provided to all the teams.  Each participating team can
submit up to three systems. Participants are welcome to use additional
resourcesand tools that are not part of the released data set.  However,
allsuch additions must be fully disclosed.

All those who registered to participate in the Shared Task will receive
an email message onJuly 7, 2014with specific instructions on how to
download the test set and how to send the automatic correction of
it. The information will also be available at the shared task group

Participants are expected to author a short paper (4 pages + 2 for
references) describing their approach, resources and experiments. The
paper needs to follow the standard format of EMNLP conference.

The following discussion group has been created and is used for all
announcements and discussions related to the shared task.!forum/qalb-shared-task

Participants are encouraged to subscribe and follow the discussion


Shared task registration period: April 8, 2014 through July 1, 2014
Shared task test release: July 7, 2014
Shared task system output collection: July 18, 2014
Submission deadline for system description papers: July 26, 2014
Author notification: August 26, 2014
Camera Ready: September 15, 2014
Arabic NLP Workshop: October 25, 2014

Behrang Mohit (co-chair), Carnegie Mellon University Qatar
Alla Rozovskaya (co-chair), Columbia University
Wajdi Zaghouani, Carnegie Mellon University Qatar
Ossama Obeid, Carnegie Mellon University Qatar
Nizar Habash (advisor), Columbia University

More information about the Ln mailing list