30.2156, Calls: Comp Ling, Pragmatics, Semantics, Syntax, Text/Corpus Ling/Germany

Thu May 23 03:54:28 UTC 2019

LINGUIST List: Vol-30-2156. Wed May 22 2019. ISSN: 1069 - 4875.

Subject: 30.2156, Calls: Comp Ling, Pragmatics, Semantics, Syntax, Text/Corpus Ling/Germany

Moderator: Malgorzata E. Cavar (linguist at linguistlist.org)
Student Moderator: Jeremy Coburn
Managing Editor: Becca Morris
Team: Helen Aristar-Dry, Everett Green, Sarah Robinson, Peace Han, Nils Hjortnaes, Yiwen Zhang, Julian Dietrich
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Everett Green <everett at linguistlist.org>
================================================================

Date: Wed, 22 May 2019 23:52:45
From: Eric Engel [eric.engel at uni-koeln.de]
Subject: International Workshop on Annotation of Non-standard Corpora

Full Title: International Workshop on Annotation of Non-standard Corpora 
Short Title: ANSC 

Date: 16-Sep-2019 - 18-Sep-2019
Location: Bamberg, Germany 
Contact Person: Stefan Hartmann
Meeting Email: stefan1.hartmann at uni-bamberg.de
Web Site: https://www.uni-bamberg.de/germ-ling/veranstaltungen/annotation-of-non-standard-corpora/ 

Linguistic Field(s): Computational Linguistics; Pragmatics; Semantics; Syntax; Text/Corpus Linguistics 

Call Deadline: 15-Jun-2019 

Meeting Description:

Corpus linguistics is enjoying growing popularity in virtually all branches of
linguistics. However, most corpora still represent just a tiny fraction of
linguistic reality as they usually consist of samples of present-day written
standard language. This workshop aims at bringing together researchers working
on the (manual or automatic) annotation of non-standard corpora. These
include, for example, historical corpora, corpora of spoken language and
co-speech gesture, chat corpora, learner corpora, or corpora of signed
languages. In particular, we focus on the peculiarities of syntactic and
semantic annotation, with the goal of discussing best-practice models in
dealing with issues of normalization and uncertainty in non-standard data.

One fairly obvious challenge is the absence of clear graphical cues for
syntactic structures: While sentence boundaries are fairly easy to identify in
corpora of present-day standard language on the basis of punctuation,
different cues have to be taken into account in modalities other than writing.
But also in written non-standard data, explicit cues for sentence boundaries
can be absent or used differently. Also, some phenomena, such as ellipses,
repetitions, and disfluencies, are highly frequent in non-standard data and
can be linguistically meaningful. Yet, their analysis poses additional
challenges since pre-processing tools that were trained on standard written
texts do not fare well with this kind of data, and there is no generally
accepted consensus on how to properly represent them. In this discussion
strand, we will assess the benefits and challenges that lie in
(semi-)automatic pre-processing of non-standard data, and we will discuss how
much normalization is needed.

As non-standard corpora tend to be fairly small, they are also well-suited for
extensive (often manual) semantic annotation. However, their non-standard
nature can entail challenges that go beyond the usual difficulties of semantic
analyses. For example, coding historical language data for semantic aspects
requires advanced knowledge not only of the language of the time period but
also of the cultural environment in which the texts were created. Also,
non-standard language can be expected to give rise to more ambiguities than
standard language. This is especially true for spoken language, which can rely
on cues from the situational context to a larger extent. With the purpose of
transparent documentation and re-usability in mind, the question arises
whether such uncertainties actually should be resolved in all cases, and what
annotation and data representation strategy should be adopted.

The issue of transparency and re-usability also pertains to other aspects that
call for a trade-off between different conflicting interests. For syntactic
annotation, community standards like the tagset developed in the Universal
Dependencies project enjoy growing popularity and have successfully been
applied to non-standard, especially spoken data. However, for semantics and
discourse annotation, the issue of comparability and generalizability of
annotation categories is still pervasive, as annotations tend to be made by
individual researchers for very specific projects. Hence, another goal of the
workshop will be to discuss best-practice strategies for the development of
annotation guidelines which satisfy the needs of a specific research question
while keeping potential future users in mind.

Our workshop aims to provide a discussion forum for these and other open
questions relating to the annotation of non-standard corpora. In addition, its
goal is to establish a network between researchers from different linguistic
disciplines facing similar challenges on very different datasets.

Call for Papers:

The topics of the workshop encompass (i) challenges of (semi-)automatic
preprocessing, (ii) annotation and data representation strategies for
non-standard data, and the treatment of ambiguities, and (iii) issues of
transparency and re-usability of annotations and tag sets, especially in
semantic annotation.
Our workshop aims to provide a discussion forum for these and other open
questions relating to the annotation of non-standard corpora. In addition, its
goal is to establish a network between researchers from different linguistic
disciplines facing similar challenges on very different datasets. 

Poster Contributions:

In addition to the oral presentations (see programme at tinyurl.com/ansc2019),
we invite abstracts for poster presentations by early-career researchers
(advanced MA students as well as PhD students). If you are working on a
project that is relevant to the subject area of the workshop and that you
would like to present as a poster, please send a short abstract (~500 words)
to stefan1.hartmann at uni-bamberg.de by June 15, 2019. Notifications of
acceptance/rejection can be expected by the end of June.

------------------------------------------------------------------------------

***************************    LINGUIST List Support    ***************************
 The 2019 Fund Drive is under way! Please visit https://funddrive.linguistlist.org
  to find out how to donate and check how your university, country or discipline
     ranks in the fund drive challenges. Or go directly to the donation site:
               https://iufoundation.fundly.com/the-linguist-list-2019

                        Let's make this a short fund drive!
                Please feel free to share the link to our campaign:
                    https://funddrive.linguistlist.org/donate/

----------------------------------------------------------
LINGUIST List: Vol-30-2156	
----------------------------------------------------------