28.1630, FYI: 2017 Shared Task on Native Language Identification

Tue Apr 4 14:17:00 UTC 2017

LINGUIST List: Vol-28-1630. Tue Apr 04 2017. ISSN: 1069 - 4875.

Subject: 28.1630, FYI: 2017 Shared Task on Native Language Identification

Moderators: linguist at linguistlist.org (Damir Cavar, Malgorzata E. Cavar)
Reviews: reviews at linguistlist.org (Helen Aristar-Dry, Robert Coté,
                                   Michael Czerniakowski)
Homepage: http://linguistlist.org

*****************    LINGUIST List Support    *****************
                       Fund Drive 2017
                   25 years of LINGUIST List!
Please support the LL editors and operation with a donation at:
           http://funddrive.linguistlist.org/donate/

Editor for this issue: Yue Chen <yue at linguistlist.org>
================================================================

Date: Tue, 04 Apr 2017 10:16:32
From: NLI Shared Task Organizers [nli.sharedtask at gmail.com]
Subject: 2017 Shared Task on Native Language Identification

 Call for Participation

Website:  https://sites.google.com/site/nlisharedtask/home

Description:

We are excited to organize a new shared task on Native Language Identification
(NLI) which will take place at the BEA12 Workshop, co-located with EMNLP in
Copenhagen, September 08, 2017.   

NLI is the task of identifying the native language (L1) of a writer based
solely on a sample of their writing or speech. The task is typically framed as
a classification problem where the set of L1s is known a priori. Most work has
focused on identifying the native language of writers learning English as a
second language. Two previous shared tasks on NLI have been organized in which
the task was to identify the native language of non-native speakers of
English-based on essays and spoken responses they provided during a
standardized assessment of academic English proficiency. The first shared task
was based on the essays only and was also held with the BEA workshop in 2013.
It was very successful with 29 teams competing, making it one of the largest
shared tasks that year. Three years later, the Computational Paralinguistics
Challenge at Interspeech 2016 hosted a sub-challenge on identifying the native
language based solely on the spoken responses.

This year's shared task combines the inputs from the two previous tasks. There
will be three tracks: NLI on the essay only, NLI on the spoken response only
(based on a transcription of the response, not the audio), and NLI using both
responses from a test taker. This distinction will make for a more challenging
shared task while building on the methods and results from the previous two
shared tasks.   We promise this shared task will be fun for you and your
colleagues, as well as your whole family.

Data:

Educational Testing Service (ETS) is releasing 13,200 English essays and
orthographic transcriptions of 13,200 spoken responses from the TOEFL iBT®
assessment for the 2017 NLI Shared Task with the goal of helping researchers
advance state-of-the-art in the field of NLI.  In addition to the orthographic
transcriptions of the spoken responses, i-vectors generated from the audio
files will be released as a baseline comparison for the speech-based NLI task
(although the audio files themselves are not included in this data set). The
data set contains test responses from 13,200 test takers (one essay and one
spoken response transcription per test taker) and includes 11 native languages
(L1s) with 1,200 test takers per L1. The 11 native languages covered by the
corpus are: Arabic, Chinese, French, German, Hindi, Italian, Japanese, Korean,
Spanish, Telugu, and Turkish. The essays typically range in length from
approximately 300 to 400 words and the transcribed spoken responses typically
contain approximately 100 words. Responses from 11,000 test takers in this set
will be used as training data for the NLI Shared Task, 1,100 for development,
and the remaining 1,100 will be released later as test data. 

Evaluation:

The shared task will be composed of three sub-tasks:

Main Task:  The first and main task will be the 11-way classification task
using all available data sources
Text Task: 11-way classification solely using the essays
Speech Task: 11-way classification using solely the transcripts and/or
i-vectors

Registration:

Please register for the shared task via the following link:

https://docs.google.com/forms/d/e/1FAIpQLSdPjJLJxDJ8h1pUKI7yCDEUkW7saBnebHyeim
qlgdqH5BwYbQ/viewform

Next, in order to obtain the training and test data for the task, all
participants must sign and return the data usage agreement form found here:

https://sites.google.com/site/nlisharedtask/data 

Important Dates:

Mar 27 - Training Data Release (Phase 1: Text)
Mid April - Training Data Release (Phase 2: Speech Transcripts and iVectors)
Jun 19 - Test Data Release
Jun 26 - Results Notification
Jul 05 - Draft System Description Papers Due
Jul 14 - Camera Ready Papers Due
Sep 08 - BEA12 Workshop

Organizers:

Aoife Cahill (Educational Testing Service)
Keelan Evanini (Educational Testing Service)
Shervin Malmasi (Harvard Medical School)
Joel Tetreault (Grammarly)

Contact email: nli.sharedtask at gmail.com

Linguistic Field(s): Computational Linguistics
                     Forensic Linguistics

------------------------------------------------------------------------------

*****************    LINGUIST List Support    *****************
                       Fund Drive 2017
Please support the LL editors and operation with a donation at:
            http://funddrive.linguistlist.org/donate/

This year the LINGUIST List hopes to raise $70,000. This money
will go to help keep the List running by supporting all of our 
Student Editors for the coming year.

Don't forget to check out the Fund Drive 2017 site!

http://funddrive.linguistlist.org/

We collect donations via the eLinguistics Foundation, a
registered 501(c) Non Profit organization with the federal tax
number 45-4211155. The donations can be offset against your
federal and sometimes your state tax return (U.S. tax payers
only). For more information visit the IRS Web-Site, or contact
your financial advisor.

Many companies also offer a gift matching program. Contact
your human resources department and send us the necessary form.

Thank you very much for your support of LINGUIST!

----------------------------------------------------------
LINGUIST List: Vol-28-1630	
----------------------------------------------------------