30.180, FYI: Call for Participation: BEA 2019 Shared Task

Sat Jan 12 10:18:10 UTC 2019

LINGUIST List: Vol-30-180. Sat Jan 12 2019. ISSN: 1069 - 4875.

Subject: 30.180, FYI: Call for Participation: BEA 2019 Shared Task

Moderator: linguist at linguistlist.org (Malgorzata E. Cavar)
Reviews: reviews at linguistlist.org (Helen Aristar-Dry, Robert Coté)
Homepage: https://linguistlist.org

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Everett Green <everett at linguistlist.org>
================================================================

Date: Sat, 12 Jan 2019 05:16:03
From: Ekaterina Kochmar [bea.nlp.workshop at gmail.com]
Subject: Call for Participation: BEA 2019 Shared Task

Call for Participation:

BEA 2019 Shared Task:

Grammatical Error Correction

Florence, Italy

August 2, 2019

https://www.cl.cam.ac.uk/research/nl/bea2019st/

Grammatical error correction (GEC) is the task of automatically correcting
grammatical errors in text; e.g. [I follows his advices -> I followed his
advice]. One of the aims of this shared task is to once again provide a
platform where different approaches can be trained and tested under the same
conditions.

This shared task introduces the data from Write&Improve corpus, a new
error-annotated dataset that represents a much more diverse cross-section of
English language levels and domains. Write&Improve is an online web platform
that assists non-native English students with their writing
(https://writeandimprove.com/). 

System output will be evaluated on a blind test set using ERRANT
(https://github.com/chrisjbryant/errant).

In addition to learner data, we will provide an annotated development and test
set extracted from the LOCNESS corpus, a collection of essays written by
native English students compiled by the Centre for English Corpus Linguistics
at the University of Louvain. 

Tracks:

There are 3 tracks in the BEA 2019 shared task. Each track controls the amount
of annotated data that can be used in a system. We place no restrictions on
the amount of unannotated data that can be used (e.g. for language modelling).

Restricted:

In the restricted setting, participants may only use the following annotated
datasets: FCE-train, Lang-8 Corpus of Learner English, NUCLE and
Write&Improve.

 Note that we restrict participants to the preprocessed Lang-8 Corpus of
Learner English rather than the raw, multilingual Lang-8 Learner Corpus
because participants would otherwise need to filter the raw corpus themselves.

Unrestricted:

In the unrestricted setting, participants may use any and all datasets,
including those in the restricted setting.

Unsupervised (or minimally supervised):

In the unsupervised setting, participants may not use any annotated training
data. Since current state-of-the-art systems rely on as much training data as
possible to reach the best performance, the goal of the unsupervised track is
to encourage research into systems that do not rely on annotated training
data. This track should be of particular interest to researchers working with
low-resource languages. Since we also expect this to be a challenging track
however, we will allow participants to use the W&I development set to develop
their systems. 

Participation:

In order to participate in the BEA 2019 Shared Task, teams are required to
submit their system output anytime up to Friday, March 29, 2019 at 23:59 GMT.
There is no explicit registration procedure. Further details about the
submission process will be provided soon.

Important Dates:

January 25, 2019: New training data released
March 25, 2019: New test data released
March 29, 2019: System output submission deadline
April 12, 2019: System results announced
May 3, 2019: System paper submission deadline
May 17, 2019: Review deadline
May 24, 2019: Notification of acceptance
June 7, 2019: Camera-ready submission deadline
August 2, 2019: BEA-2019 Workshop (Florence, Italy)

Organisers:

Christopher Bryant, University of Cambridge
Mariano Felice, University of Cambridge
Øistein Andersen, University of Cambridge
Ted Briscoe, University of Cambridge

Contact:

Questions and queries about the shared task can be sent to
bea2019st at gmail.com.

Linguistic Field(s): Computational Linguistics

------------------------------------------------------------------------------

*****************    LINGUIST List Support    *****************
Please support the LL editors and operation with a donation at:

              The IU Foundation Crowd Funding site:
       https://iufoundation.fundly.com/the-linguist-list

               The LINGUIST List FundDrive Page:
            https://funddrive.linguistlist.org/donate/

----------------------------------------------------------
LINGUIST List: Vol-30-180	
----------------------------------------------------------