28.918, Calls: Computational Linguistics, Text/Corpus Linguistics/UK

Fri Feb 17 15:14:08 UTC 2017

LINGUIST List: Vol-28-918. Fri Feb 17 2017. ISSN: 1069 - 4875.

Subject: 28.918, Calls: Computational Linguistics, Text/Corpus Linguistics/UK

Moderators: linguist at linguistlist.org (Damir Cavar, Malgorzata E. Cavar)
Reviews: reviews at linguistlist.org (Helen Aristar-Dry, Robert Coté,
                                   Michael Czerniakowski)
Homepage: http://linguistlist.org

Editor for this issue: Kenneth Steimel <ken at linguistlist.org>
================================================================

Date: Fri, 17 Feb 2017 10:14:01
From: Roland Schäfer [roland.schaefer at fu-berlin.de]
Subject: 11th Web as Corpus Workshop

Full Title: 11th Web as Corpus Workshop 
Short Title: WAC-XI 

Date: 24-Jul-2017 - 24-Jul-2017
Location: Birmingham, United Kingdom 
Contact Person: Roland Schäfer
Meeting Email: wacxi2017 at gmail.com
Web Site: https://www.sigwac.org.uk/wiki/WAC-XI 

Linguistic Field(s): Computational Linguistics; Text/Corpus Linguistics 

Call Deadline: 16-Apr-2017 

Meeting Description:

WAC-XI: The 11th Web as Corpus Workshop

Co-located with Corpus Linguistics 2017, Birmingham, 24 July 2017
Featuring the First CleanerEval Shared Task panel discussion
Endorsed by the Special Interest Group of the ACL on Web as Corpus

For almost a decade, the ACL SIGWAC, and most notably the Web as Corpus (WAC)
workshops, have served as a platform for researchers interested in the
compilation, processing and use of web-derived corpora as well as
computer-mediated communication. Past workshops were co-located with major
conferences on corpus linguistics and/or computational linguistics (such as
ACL, EACL, Corpus Linguistics, LREC, NAACL, WWW). The eleventh Web as Corpus
workshop (WAC-XI) emphasises the linguistic aspects of web corpus research
more than the technological aspects while keeping in mind that the two are
inseparable.

The World Wide Web has become increasingly popular as a source of linguistic
evidence, not only within the computational linguistics community, but also
with theoretical linguists facing problems such as data sparseness or the lack
of variation in traditional corpora of written language. Accordingly, web
corpora continue to gain relevance, given their size and diversity in terms of
genres and text types. In lexicography, web data have become a major and
well-established resource with dedicated research data and specialised tools
such as the SketchEngine. In other areas of linguistics, the adoption rate of
web corpora has been slower but steady. Furthermore, some completely new areas
of research dealing exclusively with web (or similar) data have emerged, such
as the construction and utilisation of corpora based on short messages.
Another example is the (manual or automatic) classification of web texts by
genre, register, or – more generally speaking – text type, as well as topic
area. Similarly, the areas of corpus evaluation and corpus comparison have
been advanced greatly through the rise of web corpora, mostly because web
corpora (especially larger ones in the region of several billions of tokens)
are often created by downloading texts from the web unselectively with
respect to their text type or content. While the composition (or
stratification) of such corpora cannot be determined before their
construction, it is desirable to evaluate it afterwards, at least. Also,
comparing web corpora to corpora that have been compiled in a more traditional
way is key in determining the quality of web corpora with respect to a given
research question.

Call for Papers:

The eleventh Web as Corpus workshop (WAC-XI) takes a (corpus) linguistic look
at the state of the art in all these areas. More specifically, in linguistic
publications presenting case studies based on web data, some authors
explicitly discuss and/or defend the validity of web corpus data for a
specific type of research question – while others simply take web corpora as a
new or complementary source of data without discussing fundamental questions
of data quality and appropriateness of web data for a given research question.
We think it is vital to discuss such fundamental questions, and therefore ask
researchers to present and discuss:

- Case studies in corpus or computational linguistics where web data have been
used
- Research specifically related to the validity of web data in corpus,
computational, and theoretical linguistics
- Research on the technical aspects web corpus construction which have a
strong influence on theoretical aspects of corpus design 

For example, presentations could address questions (either as part of a case
study or in the form of primary research):

- Are there substantial differences in theoretical inferences when web data
are used instead of data from traditionally compiled corpora? If so: Why? Are
they expected?
- Do findings from traditionally compiled corpora and web corpora converge
when compared with evidence from other sources (such as psycholinguistic
experiments)? If not: Which type of data matches the external findings better?
- Is it possible to analyse lectal variation with web corpora, given the
frequent lack of relevant meta data?
- How good is the quality of the (automatic) linguistic annotation of web data
compared to traditionally compiled corpora? How does this affect empirical
linguistic research with web corpora? What could corpus designers do to
improve it?
- Are there differences with regard to the dispersion of linguistic entities
in web corpora compared to traditionally compiled corpora? If so: Why? Does
it matter? How can we deal with it or even profit from it?
- How do very large web corpora compare to smaller, more intentionally
stratified web corpora created for a specific task? How can it be decided
which type of corpus is better for a given research question?

Submission format

We call for anonymous extended abstracts of 1,000 – 1,500 words length
(excluding references, tables, and figures). Submissions must be in PDF
format. Authors of accepted papers will receive minimal formatting
instructions for the publication of the abstracts on the WAC-XI website in due
time. There will be no proceedings volume, but a successful workshop might
lead to a special issue/edited volume on web (and similar) data in linguistics
(with a new round of peer reviewing), for which a separate call for (full)
papers would be published after the workshop.
Submission website

Please use our EasyChair installation exclusively
(https://easychair.org/conferences/?conf=wac11). 

Important Dates:

16 February 2017: First call for workshop papers
13 March 2017: Second call for workshop papers
16 April 2017: Abstract due date (23:59 GMT)
5 June 2017: Notification of acceptance
24 July 2017: Workshop day

Organizers:

Adrien Barbaresi (BBAW Berlin/ÖAW Vienna)
Felix Bildhauer (IDS Mannheim)
Roland Schäfer (Freie Universität Berlin (DFG))

Programme Committee:

Masayuki Asahara, Nat. Inst. for Jap. Lang. and Ling.
Piotr Bánski, IDS Mannheim
Silvia Bernardini, U of Bologna
Niels Brügger, University of Aarhus
Sascha Diwersy, Université Montpellier 3
Stefan Evert, FAU Erlangen
Susanne Flach, Freie Universität Berlin
Cédrick Fairon, UC Louvain
William H. Fletcher, U.S. Naval Academy
Jack Grieve, Aston University
Aurelie Herbelot, University of Trento
Matthias Hüning, FU Berlin
Detmar Meurers, Universität Tübingen
Miloš Jakubíček, Masaryk University Brno
Iztok Kosem, Trojina, Institute for Applied Slovene Studies
Anne Krause, Universität Leipzig
Simon Krek, Jožef Stefan Institute
Lothar Lemnitzer, BBAW
Nikola Ljubešić, Jožef Stefan Institute, Ljubljana
Steffen Remus, TU Darmstadt
Antonio Ruiz Tinoco, Sophia University
Kevin Scannell, Saint Louis U
Serge Sharoff, University of Leeds
Barbara Schlücker, Universität Bonn
Sabine Schulte im Walde, IMS Stuttgart
Klaus Schulz, LMU München
Egon Stemle, EURAC Bozen/Bolzano
Peter Uhrig, FAU Erlangen
Marieke van Erp, VU Amsterdam
Wajdi Zaghouani, CMU Qatar
Amir Zeldes, Georgetown University, Washington
Arne Zeschel, IDS Mannheim

----------------------------------------------------------
LINGUIST List: Vol-28-918	
----------------------------------------------------------