30.1580, Calls: Computational Linguistics, Typology/Bulgaria
The LINGUIST List
linguist at listserv.linguistlist.org
Wed Apr 10 22:16:44 UTC 2019
LINGUIST List: Vol-30-1580. Wed Apr 10 2019. ISSN: 1069 - 4875.
Subject: 30.1580, Calls: Computational Linguistics, Typology/Bulgaria
Moderator: Malgorzata E. Cavar (linguist at linguistlist.org)
Student Moderator: Jeremy Coburn
Managing Editor: Becca Morris
Team: Helen Aristar-Dry, Everett Green, Sarah Robinson, Peace Han, Nils Hjortnaes, Yiwen Zhang, Julian Dietrich
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org
Homepage: http://linguistlist.org
************************************** LINGUIST List Support **************************************
Fund Drive 2019
29 years of LINGUIST List! The annual Fund Drive is on!
Please support the LINGUIST List to ensure we can continue to deliver important information to your mailbox.
Every amount counts:
https://funddrive.linguistlist.org/donate/
Editor for this issue: Everett Green <everett at linguistlist.org>
================================================================
Date: Wed, 10 Apr 2019 18:14:27
From: Harald Hammarström [harald.hammarstrom at lingfil.uu.se]
Subject: Grammar Data Mining: Extracting Linguistic Features From Grammatical Descriptions
Full Title: Grammar Data Mining: Extracting Linguistic Features From Grammatical Descriptions
Short Title: GDM
Date: 05-Sep-2019 - 06-Sep-2019
Location: Varna, Bulgaria
Contact Person: Harald Hammarström
Meeting Email: harald.hammarstrom at lingfil.uu.se
Web Site: https://spraakbanken.gu.se/lsi/sharedtask/
Linguistic Field(s): Computational Linguistics; Typology
Call Deadline: 30-Jun-2019
Meeting Description:
The present Workshop/Shared Task seeks to transform a large set of digitized
publications describing the grammars of the languages of the world into
structured databases that will enable comparison of different languages at an
unprecedented breadth and depth.
There are some 6 500 languages in the world and information about their
grammatical characteristics is available in book-form for over 4000 of them.
Until recently, extraction of information from grammars has been done
exclusively through manual collection. This procedure is naturally bounded by
the limits of human capacities, and as such can only target a relatively small
amount of languages/characteristics at a substantial time investment in a
given time.
We are now entering a phase where it is practical to use NLP tools for a
number of similar tasks. A computer may minimally infer some characteristics
of the language described simply by counting words used in a grammatical
description, e.g., a high-frequency of the term ’suffix’ likely indicates that
the language being described uses a lot of suffixes. Further, there are less
straightforward or more detailed characteristics traditionally of interest to
linguists, such as where the verb is placed in then sentence (beginning,
middle, end), the existence and use of participles, possessive constructions,
evidentiality and so on. Any techniques from the NLP toolbox such as
td-idf-weighting, tagging, parsing and vector spaces may be used in
combination and as input in more sophisticated Machine Learning approaches.
In this shared task we provide a subset of the World Atlas of Language
Structures (WALS, http://wals.info) along with the digitized sources from
which the features were drawn. Sources are provided in raw text form. The task
is to infer WALS datapoints from the raw text data of the digitized
grammatical descriptions.
Authors should submit a paper of up to 8 pages conforming to the RANLP style
guidelines (see http://lml.bas.bg/ranlp2019/submissions.php) describing their
technical solution to the specific task.
Workshop paper submission deadline: 30 June 2019
Workshop paper acceptance notification: 28 July 2019
Workshop paper camera-ready version: 20 August 2019
Workshop: 5-6 September 2019
Each submission will be evaluated against a test set of 1000 random datapoints
drawn from the same origin as the training data set.
The workshop will be co-located with RANLP http://lml.bas.bg/ranlp2019 in
Bulgaria and take place in Hotel ''Cherno More'', Varna, the main RANLP-2019
conference venue.
In this shared task we provide a subset of the World Atlas of Language
Structures (WALS, http://wals.info) along with the digitized sources from
which the features were drawn. Sources are provided in raw text form. The task
is to infer WALS datapoints from the raw text data of the digitized
grammatical descriptions.
Call for Papers:
For training data, task, submission instructions, important dates, evaluation
and venue, see:
https://spraakbanken.gu.se/lsi/sharedtask/
Programme Committee:
Guillaume Segerer (CNRS, LLACAN, France)
Harald Hammarström (Department of Linguistics and Philology, Uppsala
University, Sweden)
Markus Forsberg (Språkbanken, University of Gothenburg, Sweden)
Søren Wichmann (Leiden University Centre for Linguistics, Netherlands)
Shafqat Mumtaz Virk (Språkbanken, University of Gothenburg, Sweden)
Zeljko Agic (IT University of Copenhagen, Denmark)
Erich Round (University of Queensland, Australia)
Sebastian Nordhoff (LangSci Press, Germany)
------------------------------------------------------------------------------
*************************** LINGUIST List Support ***************************
The 2019 Fund Drive is under way! Please visit https://funddrive.linguistlist.org
to find out how to donate and check how your university, country or discipline
ranks in the fund drive challenges. Or go directly to the donation site:
https://iufoundation.fundly.com/the-linguist-list-2019
Let's make this a short fund drive!
Please feel free to share the link to our campaign:
https://funddrive.linguistlist.org/donate/
----------------------------------------------------------
LINGUIST List: Vol-30-1580
----------------------------------------------------------
More information about the LINGUIST
mailing list