31.1259, FYI: Call for Participation: SIGTYP 2020 Shared Task on the Prediction of Typological Features

The LINGUIST List linguist at listserv.linguistlist.org
Fri Apr 3 19:54:09 UTC 2020


LINGUIST List: Vol-31-1259. Fri Apr 03 2020. ISSN: 1069 - 4875.

Subject: 31.1259, FYI:  Call for Participation: SIGTYP 2020 Shared Task on the Prediction of Typological Features

Moderator: Malgorzata E. Cavar (linguist at linguistlist.org)
Student Moderator: Jeremy Coburn
Managing Editor: Becca Morris
Team: Helen Aristar-Dry, Everett Green, Sarah Robinson, Lauren Perkins, Nils Hjortnaes, Yiwen Zhang, Joshua Sims
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Sarah Robinson <srobinson at linguistlist.org>
================================================================


Date: Fri, 03 Apr 2020 15:50:48
From: Edoardo Maria Ponti [ep490 at cam.ac.uk]
Subject: Call for Participation: SIGTYP 2020 Shared Task on the Prediction of Typological Features

 
https://sigtyp.github.io/st2020.html

In 2020, SIGTYP is offering a shared task on the prediction of typological
features. The shared task encompasses nearly 2,000 languages, with typological
features taken from the World Atlas of Language Structures (WALS; Dryer and
Haspelmath 2013).

To participate in the shared task, you will build a system that can predict
typological properties of languages, given a handful of observed features.
Training examples and development examples have already been provided (see
link below). All submitted systems will be compared on a held-out test set.

Moreover, you will be invited to describe your system in a system paper for
the SIGTYP workshop proceedings. The task organisers will write an overview
paper that describes the task and summarises the different approaches taken,
and their results.

[ Important Links ]

- Download Train and Dev data:
https://github.com/sigtyp/ST2020/tree/master/data
- Register for the Task! https://sigtyp.github.io/st2020-reg.html

[Important Dates]

- Training data Release: 26 March 2020
- Test data Release: 20 June 2020
- Submissions Due: 1 July 2020
- Writeup Due: 1 August 2020

[ Description ]

The typological features in WALS represent one approach to the categorization
of the languages of the world according to their linguistic properties, e.g.
in terms of their syntax, morphology, phonology inter alia. One example of
such a typological feature is the basic word order feature. For instance,
English is best described as a subject-verb-object (SVO) language whereas
Japanese is best described as a subject-object-verb (SOV) language.

One major issue with WALS, however, is that it is both sparse and skewed in
terms of language-feature annotations. It is sparse in the sense that most
languages only have annotations for a handful of features, and skewed in the
sense that a few features have much wider coverage than others. Luckily, such
features often correlate with one another, which allows for prediction of
those features from others. For instance, languages where the verb precedes
the object tend to have prepositions, e.g. Norwegian, whereas languages where
the object precedes the verb word tend to have postpositions, e.g. Japanese.

Although there is a significant amount of previous work dealing with versions
of this task (Daumé III and Campbell 2017; Bjerva et al. 2019; Ponti et al.
2019), important design choices have been frequently ignored. Some papers
controlled for genetic relationships between training and evaluation
languages, but little-to-no work has considered controlling for geographical
proximity.

The shared task will consist of two settings (subtasks):
1) Constrained: only provided training data can be employed.
2) Unconstrained: training data can be extended with any external source of
information (e.g. pre-trained embeddings, raw texts, etc.)

[ Organizers ]
Johannes Bjerva
Isabelle Augenstein
Aditi Chaudhary
Edoardo M. Ponti
Giuseppe Celano
Liz Salesky
Ryan Cotterell
Michael Regan
Sabrina J. Mielke

[ Contact ]
- email: sigtyp AT gmail DOT com
- website: https://sigtyp.github.io/st2020.html
 



Linguistic Field(s): Cognitive Science
                     Computational Linguistics
                     Linguistic Theories
                     Text/Corpus Linguistics
                     Typology





 



------------------------------------------------------------------------------

***************************    LINGUIST List Support    ***************************
 The 2019 Fund Drive is under way! Please visit https://funddrive.linguistlist.org
  to find out how to donate and check how your university, country or discipline
     ranks in the fund drive challenges. Or go directly to the donation site:
               https://iufoundation.fundly.com/the-linguist-list-2019

                        Let's make this a short fund drive!
                Please feel free to share the link to our campaign:
                    https://funddrive.linguistlist.org/donate/
 


----------------------------------------------------------
LINGUIST List: Vol-31-1259	
----------------------------------------------------------






More information about the LINGUIST mailing list