[Ura-list] Call for Posters: Computational Methods for Endangered Language Documentation and Description

Fri Dec 8 14:17:15 UTC 2017

*Workshop: Computational Methods for Endangered Language Documentation and
Description*
February 1st-2nd, 2018
Ecole Normale Supérieure, 45 rue d’Ulm, 75005 Paris, France
http://lattice.cnrs.fr/cmld

Posters can either present finished project results or project
descriptions, incl. descriptions of relevant tools or workflows. Deadline
for poster abstract submissions (at least 500 words long) is *21.12.2017*.

Please send the abstracts to: michael.riessler at uni-bielefeld.de

*Organizers*

Thierry Poibeau (thierry.poibeau at ens.fr)
Michael Rießler (michael.riessler at uni-bielefeld.de)
Niko Partanen (nikotapiopartanen at gmail.com)

*Workshop description*

There is a significant gap between digital methods applied in corpus
building and corpus exploration for the numerous small and often
endangered, low-resource languages compared to the high-resource majority
languages. Corpora for endangered minority languages are typically built
out of spoken data, which have first to be recorded and transcribed and are
therefore relatively small. Majority language corpora, on the other hand,
are considerably bigger and include predominantly language data from
diverse digital (or digitalized) written sources.

Whereas majority language corpus linguists develop and apply Natural
Language Processing tools and attempt to automatize the annotation process,
usually with the help of manually checked gold corpus, field linguists most
typically rely on manual (or occasionally semi-manual) methods during the
entire process. In many cases of fieldwork-based endangered language
documentation projects, manual methods are in fact the most convenient
choice, rather than to start developing computational linguistic resources
from scratch. This is especially true if the linguistic structures of the
languages in question are yet unknown, there is no established writing
system, and the available corpus data are finite and small in quantity.

However, there are also many small or medium-size endangered languages for
which the basic grammatical structures have already been described and
which have established writing systems. This situation is common in
Northern Eurasia, where basically all minority languages are also written
today. Still, most of these languages have not been in the focus of
computational and corpus linguistic research so far. This is true despite
the fact that there are written corpus data of significant size available
for several of these languages.

The workshop aims at examining the application of specific methods from
Natural Language Processing in order to analyze data from endangered and
low-resource languages from Northern Eurasia and other parts of the world.
The workshop defines language technologies in a very broad sense and
therefore includes also computational methods for signal processing in
general, as such technologies can be applied effectively to the work with
text corpora linked to multimedia data.

The event will feature a few invited presentations and tutorials. In
addition, there will be slots for interested participants to present
posters on their own thematic projects.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/ura-list/attachments/20171208/14eb9cd2/attachment.htm>