30.2982, FYI: Arabic Dialect Identification Dataset public release

The LINGUIST List linguist at listserv.linguistlist.org
Thu Aug 1 02:30:20 UTC 2019


LINGUIST List: Vol-30-2982. Wed Jul 31 2019. ISSN: 1069 - 4875.

Subject: 30.2982, FYI: Arabic Dialect Identification Dataset public release

Moderator: Malgorzata E. Cavar (linguist at linguistlist.org)
Student Moderator: Jeremy Coburn
Managing Editor: Becca Morris
Team: Helen Aristar-Dry, Everett Green, Sarah Robinson, Peace Han, Nils Hjortnaes, Yiwen Zhang, Julian Dietrich
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Everett Green <everett at linguistlist.org>
================================================================


Date: Wed, 31 Jul 2019 22:29:46
From: Houda Bouamor [hbouamor at qatar.cmu.edu]
Subject: Arabic Dialect Identification Dataset public release

 
Carnegie Mellon University in Qatar and the CAMEL Lab at New York University
Abu Dhabi and are happy to announce the release of all of the datasets and
code provided for the MADAR Shared Task on Arabic Fine-Grained Dialect
Identification, which was part of the Fourth Arabic Natural Language
Processing Workshop (ACL 2019). The dataset consists of two parts pertaining
to Subtask 1 and Subtask 2.

Subtask 1: MADAR Travel Domain Dialect Identification. The data of this
subtask is the same reported on in Bouamor et al. (2018)
(https://www.aclweb.org/anthology/L18-1535) and Salameh et al. (2018)
(https://www.aclweb.org/anthology/C18-1113).
Subtask 2: MADAR Twitter User Dialect Identification. This is a new data set
created for this shared task (Bouamor et al. 2019
(https://camel.abudhabi.nyu.edu/madar-shared-task-2019/MADAR_SharedTask_Summar
y_Paper_WANLP_ACL_2019.pdf)).

These resources were developed as part of the Multi-Arabic Dialect
Applications and Resources (MADAR) Project (http://madar.camel-lab.com), a
collaboration between Carnegie Mellon University in Qatar and New York
University Abu Dhabi.

The datasets are available for download at:
https://camel.abudhabi.nyu.edu/madar-shared-task-2019/

Regards,
Houda Bouamor on behalf of the MADAR project team.
 



Linguistic Field(s): Computational Linguistics
                     Lexicography
                     Morphology
                     Text/Corpus Linguistics
                     Translation

Subject Language(s): Arabic, Algerian (arq)
                     Arabic, Eastern Egyptian Bedawi (avl)
                     Arabic, Egyptian (arz)
                     Arabic, Gulf (afb)
                     Arabic, Hijazi (acw)
                     Arabic, Libyan (ayl)
                     Arabic, Moroccan (ary)
                     Arabic, Najdi (ars)
                     Arabic, North Levantine (apc)
                     Arabic, Omani (acx)
                     Arabic, Saidi (aec)
                     Arabic, Sanaani (ayn)
                     Arabic, South Levantine (ajp)
                     Arabic, Standard (arb)
                     Arabic, Sudanese (apd)
                     Arabic, Ta'izzi-Adeni (acq)
                     Arabic, Tunisian (aeb)





 



------------------------------------------------------------------------------

***************************    LINGUIST List Support    ***************************
 The 2019 Fund Drive is under way! Please visit https://funddrive.linguistlist.org
  to find out how to donate and check how your university, country or discipline
     ranks in the fund drive challenges. Or go directly to the donation site:
               https://iufoundation.fundly.com/the-linguist-list-2019

                        Let's make this a short fund drive!
                Please feel free to share the link to our campaign:
                    https://funddrive.linguistlist.org/donate/
 


----------------------------------------------------------
LINGUIST List: Vol-30-2982	
----------------------------------------------------------






More information about the LINGUIST mailing list