28.5186, Calls: Computational Linguistics/Czech Republic

The LINGUIST List linguist at listserv.linguistlist.org
Fri Dec 8 16:54:45 UTC 2017


LINGUIST List: Vol-28-5186. Fri Dec 08 2017. ISSN: 1069 - 4875.

Subject: 28.5186, Calls: Computational Linguistics/Czech Republic

Moderators: linguist at linguistlist.org (Damir Cavar, Malgorzata E. Cavar)
Reviews: reviews at linguistlist.org (Helen Aristar-Dry, Robert Coté,
                                   Michael Czerniakowski)
Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           http://funddrive.linguistlist.org/donate/

Editor for this issue: Kenneth Steimel <ken at linguistlist.org>
================================================================


Date: Fri, 08 Dec 2017 11:54:06
From: Christin Schätzle [christin.schaetzle at uni-konstanz.de]
Subject: Workshop on Data Provenance and Annotation in Computational Linguistics

 
Full Title: Workshop on Data Provenance and Annotation in Computational Linguistics 

Date: 22-Jan-2018 - 22-Jan-2018
Location: Prague, Czech Republic 
Contact Person: Miriam Butt
Meeting Email: miriam.butt at uni-konstanz.de
Web Site: https://typo.uni-konstanz.de/dataprovenance/ 

Linguistic Field(s): Computational Linguistics 

Call Deadline: 22-Dec-2017 

Meeting Description:

Workshop on Data Provenance and Annotation in Computational Linguistics

Co-located with the Treebanks and Linguistic Theory (TLT) conference 2018 in
Prague is a special Workshop on Data Provenance and Annotation in
Computational Linguistics.

Invited Speakers:

Adriane Boyd, Universität Tübingen
Peter Buneman, University of Edinburgh
Nicoletta Calzolari, Italian National Research Council
Sarah Cohen Boulakia, Université Paris Sud


Call for Posters:

Workshop on Data Provenance and Annotation in Computational Linguistics

Co-located with the Treebanks and Linguistic Theory (TLT) conference 2018 in
Prague is a special Workshop on Data Provenance and Annotation in
Computational Linguistics.

This is a call for posters to be presented at the workshop.  The deadline for
submissions is December 22, 2017.  Notification of acceptance will be by
December 31, 2017.

The workshop seeks to bring together researchers from the fields of
provenance, data annotation, and data curation with researchers working within
computational linguistics and dealing with the annotation of language data.
Provenance is concerned with understanding how to model, record, and share
metadata about the origin of data and the further sharing or processing that
data has undergone. While provenance has been studied in various domains
(e.g., for business applications or in the life sciences), many of the central
issues are also of vital interest for computational linguistics.

For example, issues of ''data cleaning“ and data curation both have serious
repercussions for the reproducibility of analyses or experiments. In general,
computational linguistic work with data tends to involve several
pre-processing steps (stop-lists, data normalization, filtering out of
information that is considered to be not at-issue or error correction).
However, these steps are seldom documented or described in detail. Data sets
may also undergo several rounds of pre-processing, with information about the
successive changes again not well documented. Data may also be automatically
or semi-automatically generated. In computational linguistics this often takes
the form of automatic or semi-automatic data annotation. This, as well as
manual annotation, is prone to errors and inter-annotator disagreement,
leading to rounds of adjucation or
correction. This work with data is also generally not documented (in detail)
so that annotation decisions may be hard to „undo“. Finally, once a data set
is released, newer versions will inevitably also have to be released to deal
with data expansion or correction. In this case, proper versioning and data
curation is vital to ensure experimental and analytical reproducability.

While computational linguists deal with these issues on a daily basis, there
is little awareness of established methodology and best practices coming from
the field of data provenance. The aim of this workshop is to begin a dialog.
On the one hand, we aim to create awareness of the needs and challenges posed
by linguistic data in the data provenance community. On the other hand, we aim
to import an understanding of the experiences and best practices established
with respect to data provenance into the computational linguistics community.

Submissions:

Authors are invited to submit an abstract of no longer than two A4 pages in
length, including references and data. Abstracts must have 2.5 cm (1 inch)
margins on all sides and be set in Times New Roman with a font size no smaller
than 11pt. The submissions must not reveal the identity of the author(s) in
any way. 

Abstracts must be submitted in PDF format through EasyChair by December 22,
2017, 11:59 EST. To submit your abstract, please click on the following link:
https://easychair.org/conferences/?conf=pacl2018 

Organising committee:

Miriam Butt, University of Konstanz 
Melanie Herschel, University of Stuttgart
Christin Schätzle, University of Konstanz




------------------------------------------------------------------------------

*****************    LINGUIST List Support    *****************
Please support the LL editors and operation with a donation at:
            http://funddrive.linguistlist.org/donate/
 


----------------------------------------------------------
LINGUIST List: Vol-28-5186	
----------------------------------------------------------






More information about the LINGUIST mailing list