30.529, FYI: Building Educational Applications 2019 Shared Task

The LINGUIST List linguist at listserv.linguistlist.org
Fri Feb 1 21:26:40 UTC 2019


LINGUIST List: Vol-30-529. Fri Feb 01 2019. ISSN: 1069 - 4875.

Subject: 30.529, FYI: Building Educational Applications 2019 Shared Task

Moderator: Malgorzata E. Cavar (linguist at linguistlist.org)
Student Moderator: Jeremy Coburn
Managing Editor: Becca Morris
Team: Helen Aristar-Dry, Everett Green, Sarah Robinson, Peace Han, Nils Hjortnaes, Yiwen Zhang, Julian Dietrich
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Everett Green <everett at linguistlist.org>
================================================================


Date: Fri, 01 Feb 2019 16:24:47
From: Ildikó Pilán [ildiko.pilan at gmail.com]
Subject: Building Educational Applications 2019 Shared Task

 
Building Educational Applications 2019 Shared Task: Grammatical Error
Correction
Florence, Italy
August 2, 2019

NEW! 25/01/2019: Training data released!

Call for Participation

https://www.cl.cam.ac.uk/research/nl/bea2019st/

Grammatical error correction (GEC) is the task of automatically correcting
grammatical errors in text; e.g. [I follows his advices -> I followed his
advice]. It can be used to not only help language learners improve their
writing skills, but also alert native speakers to accidental mistakes or
typos.

GEC gained significant attention in the Helping Our Own (HOO) and CoNLL shared
tasks between 2011 and 2014, but has since become more difficult to evaluate
given a lack of standardised experimental settings. In particular, recent
systems have been trained, tuned and tested on 
different combinations of corpora using different metrics. One of the aims of
this shared task is hence to once again provide a platform here different
approaches can be trained and tested under the same conditions.

Another significant problem facing the field is that system performance is
still primarily benchmarked against the CoNLL-2014 test set, even though this
5-year-old dataset only contains 50 essays on 2 different topics written by 25
South-East Asian undergraduates in Singapore. This means that systems have
increasingly overfit to a very specific genre of English and so do not
generalise well to other domains. As a result, this shared task introduces the
Cambridge English Write & Improve (W&I) corpus, a new error-annotated dataset
that represents a much more diverse cross-section of English language levels
and domains. Write & Improve is an online web platform that assists non-native
English students with their writing (https://writeandimprove.com/).

Participating teams will be provided with training and development data from
the W&I corpus to build their systems. Depending on the chosen track,
supplementary data may also be used. System output will be evaluated on a
blind test set using ERRANT (https://github.com/chrisjbryant/errant).

In addition to learner data, we will provide an annotated development and test
set extracted from the LOCNESS corpus, a collection of essays 
written by native English students compiled by the Centre for English Corpus
Linguistics at the University of Louvain.

Tracks:

There are 3 tracks in the BEA 2019 shared task. Each track controls the amount
of annotated data that can be used in a system. We place no restrictions on
the amount of unannotated data that can be used (e.g. for language modelling).

- Restricted
In the restricted setting, participants may only use the following annotated
datasets: FCE, Lang-8 Corpus of Learner English, NUCLE, W&I and LOCNESS. Note
that we restrict participants to the preprocessed Lang-8 Corpus of Learner
English rather than the raw, multilingual Lang-8 Learner Corpus because
participants would otherwise need to filter the raw corpus themselves.

- Unrestricted
In the unrestricted setting, participants may use any and all datasets,
including those in the restricted setting.

- Unsupervised (or minimally supervised)
In the unsupervised setting, participants may not use any annotated training
data. Since current state-of-the-art systems rely on as much training data as
possible to reach the best performance, the goal of the unsupervised track is
to encourage research into systems that do not rely on annotated training
data. This track should be of particular interest to researchers working with
low-resource languages. Since we also expect this to be a challenging track
however, we will allow participants to use the W&I+LOCNESS development set to
develop their systems.

See further details at
https://www.cl.cam.ac.uk/research/nl/bea2019st/

Participation:

In order to participate in the BEA 2019 Shared Task, teams are required to
submit their system output any time between March 25-29, 2019 at 23:59 GMT.
There is no explicit registration procedure. Further details about the
submission process will be provided soon.

Important Dates:

Friday, Jan 25, 2019: New training data released
Monday, March 25, 2019: New test data released
Friday, March 29, 2019: System output submission deadline
Friday, April 12, 2019: System results announced
Friday, May 3, 2019: System paper submission deadline
Friday, May 17, 2019: Review deadline
Friday, May 24, 2019: Notification of acceptance
Friday, June 7, 2019: Camera-ready submission deadline
Friday, August 2, 2019: BEA-2019 Workshop (Florence, Italy)

Organisers:

Christopher Bryant, University of Cambridge
Mariano Felice, University of Cambridge
Øistein Andersen, University of Cambridge
Ted Briscoe, University of Cambridge

Contact:

Questions and queries about the shared task can be sent to
bea2019st at gmail.com.

Further details can be found at
https://www.cl.cam.ac.uk/research/nl/bea2019st/
 



Linguistic Field(s): Computational Linguistics





 



------------------------------------------------------------------------------

*****************    LINGUIST List Support    *****************
Please support the LL editors and operation with a donation at:

              The IU Foundation Crowd Funding site:
       https://iufoundation.fundly.com/the-linguist-list

               The LINGUIST List FundDrive Page:
            https://funddrive.linguistlist.org/donate/
 


----------------------------------------------------------
LINGUIST List: Vol-30-529	
----------------------------------------------------------






More information about the LINGUIST mailing list