[Corpora-List] First Call for EMNLP Workshop on Arabic Natural Language Processing & Shared Task on Automatic Arabic Error Correction
Nizar Habash
habash at ccls.columbia.edu
Tue Mar 25 21:04:40 UTC 2014
=======================================================
First Call for Papers and Participation
EMNLP Workshop on Arabic Natural Language Processing
Including Shared Task on Automatic Arabic Error Correction
Apologies for multiple postings
Please distribute to colleagues
=======================================================
First Call for Papers and Participation
Arabic Natural Language Processing Workshop
collocated with EMNLP 2014, Doha, Qatar
Workshop date: Saturday October 25, 2014
Paper submission deadline: July 26, 2014
Shared task registration deadline: July 1, 2014
=======================================================
====================
WORKSHOP DESCRIPTION
====================
There has been a lot of progress in the last 15 years in the area of
Arabic Natural Language Processing (NLP). Many Arabic NLP (or Arabic
NLP-related) workshops and conferences have taken place, both in the
Arab World and in association with international conferences, e.g.,
the conference on Arabic Language Resources and Tools (MEDAR-2009,
NEMLAR-2004), the workshop on Computational Approaches to Semitic
Languages (LREC 2010, EACL 2009, ACL 2007, ACL 2005, ACL 2002, ACL
1998), the workshop on Computational Approaches to Arabic Script-based
Languages (MTSummit XII 2009, LSA 2007, COLING 2004), the
International Symposium on Computer and Arabic Language (ISCAL 2009,
ISCAL 2007), the Colloque International sur le Traitement Automatique
de la Langue Arabe (CITALA 2007), the International Symposium on
Processing of Arabic (Tunisia 2002), the workshop on Arabic Language
Resources and Evaluation (LREC 2002), and the workshop on Arabic
Language Processing (ACL -2001), among others. This workshop proposal
follows in the footsteps of these efforts to provide a forum for
researchers to share and discuss their ongoing work. This workshop is
timely given the continued rise in research projects focusing on
Arabic NLP in the Arab World and the West.
We invite submissions on topics that include, but are not limited to,
the following:
* Basic core technologies: morphological analysis, disambiguation,
tokenization, POS tagging, named entity detection, chunking,
parsing, semantic role labeling, sentiment analysis, Arabic dialect
modeling, etc.
* Applications: machine translation, speech recognition, speech
synthesis, optical character recognition, pedagogy, assistive
technologies, social media, etc.
* Resources: dictionaries, annotated data, specialized databases etc.
Submissions may include work in progress as well as finished work.
Submissions must have a clear focus on specific issues pertaining to
the Arabic language whether it is standard Arabic, dialectal, or
mixed. Descriptions of commercial systems are welcome, but authors
should be willing to discuss the details of their work. Submissions
are expected to be 8 pages long plus 2 pages for references.
Associated with the workshop will be a shared task on Arabic text
error correction (details below).
===========
SHARED TASK
===========
As part of the Arabic Natural Language Processing Workshop at EMNLP
2014 (to be held in Doha, Qatar), we will conduct a shared task on
Automatic Arabic Error Correction. We designed this task in the
traditions of high profile shared tasks in natural language processing
such as CONLLÕs grammar/error detection and correction shared tasks in
2011-2013 and numerous machine translation campaigns by
NIST/WMT/MEDAR, among others. The task relies on resources created
under the Qatar Arabic Language Bank (QALB) project (currently over 1M
words of manually corrected Arabic text). A participating system in
this shared task will be given Modern Standard Arabic texts, which are
to be automatically corrected. The provided input will be provided in
Arabic script and in a standard Romanization scheme, and will be
annotated for part-of-speech (in three different granularities),
clitics (which appear in 20% of Arabic words), lemmas, English
glosses, and dependency tree relations. All of the input text will be
preprocessed in a common way to make sure all participants have access
to all of these features at no additional overhead novelty cost. An
XML format will be used to encode all of this information. A
participating system then returns a corrected version of the Arabic
text that is one sentence per line in an XML format. The task is
focused on correction as opposed to identification. There will not be
an error identification task per se. Participants need to register.
Once registered, all participating teams will be provided with a
common training data set, which includes common preprocessed input and
corrected output. A common development set will also be provided. A
blind test data set will be used to evaluate the output of the
participating teams. An evaluation script will be provided to all the
teams. Participants are expected to author a short paper (4 pages + 2
for references) describing their approach, resources and experiments.
The paper needs to follow the standard format of EMNLP conference.
===============
IMPORTANT DATES
===============
Shared task registration period: April8, 2014 through July 1, 2014
Shared task test release: July 7, 2014
Shared task system output collection: July 18, 2014
Submission deadline (Workshop and shared task papers): July 26, 2014
Author notification: August 26, 2014
Camera Ready: September 15, 2014
Workshop: October 25, 2014
==========
ORGANIZERS
==========
Program Co-chairs
Nizar Habash, Columbia University
Stephan Vogel, Qatar Computing Research Institute
Publication Co-chairs
Nadi Tomeh, Paris 13 University
Houda Bouamor, Carnegie Mellon University Qatar
Website Committee
Kareem Darwish, Qatar Computing Research Institute
Noura Farra, Columbia University
Shared Task Committee
Behrang Mohit, Carnegie Mellon University Qatar
Alla Rozovskaya, Columbia University
Wajdi Zaghouani, Carnegie Mellon University Qatar
Ossama Obeid, Carnegie Mellon University Qatar
Nizar Habash, Columbia University (advisory)
Program Committee Members
(TBA in Second Call)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20140325/20b4f4bd/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list