31.190, Calls: Comp Ling, Lexicography, Morphology, Syntax, Text/Corpus Ling / France
    The LINGUIST List 
    linguist at listserv.linguistlist.org
       
    Tue Jan 14 20:11:16 UTC 2020
    
    
  
LINGUIST List: Vol-31-190. Tue Jan 14 2020. ISSN: 1069 - 4875.
Subject: 31.190, Calls: Comp Ling, Lexicography, Morphology, Syntax, Text/Corpus Ling / France
Moderator: Malgorzata E. Cavar (linguist at linguistlist.org)
Student Moderator: Jeremy Coburn
Managing Editor: Becca Morris
Team: Helen Aristar-Dry, Everett Green, Sarah Robinson, Peace Han, Nils Hjortnaes, Yiwen Zhang, Julian Dietrich
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org
Homepage: http://linguistlist.org
Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/
Editor for this issue: Sarah Robinson <srobinson at linguistlist.org>
================================================================
Date: Tue, 14 Jan 2020 15:11:10
From: Felix Bildhauer [bildhauer at ids-mannheim.de]
Subject: 12th Web as Corpus Workshop
 
Full Title: 12th Web as Corpus Workshop 
Short Title: WAC-XII 
Date: 16-May-2020 - 16-May-2020
Location: Marseille, France 
Contact Person: Roland Schäfer
Meeting Email: roland.schaefer at fu-berlin.de
Web Site: https://www.sigwac.org.uk/wiki/WAC-XII 
Linguistic Field(s): Computational Linguistics; Lexicography; Morphology; Syntax; Text/Corpus Linguistics 
Call Deadline: 16-Feb-2020 
Meeting Description:
For almost fifteen years, the ACL SIGWAC, and most notably the Web as Corpus
(WAC) workshops, have served as a platform for researchers interested in the
compilation, processing and use of web-derived corpora as well as
computer-mediated communication. Past workshops were co-located with major
conferences on corpus linguistics and/or computational linguistics (such as
ACL, EACL, Corpus Linguistics, LREC, NAACL, WWW).
In corpus/theoretical linguistics, the World Wide Web has become increasingly
popular as a source of linguistic evidence, especially in the face of data
sparseness or the lack of variation in traditional corpora of written
language. In lexicography, web data have become a major and well-established
resource with dedicated research data and specialised tools. In other areas of
theoretical linguistics, the adoption rate of web corpora has been slower but
steady. Furthermore, some completely new areas of linguistic research dealing
exclusively with web (or similar) data have emerged, such as the construction
and utilisation of corpora based on short messages. Another example is the
(manual or automatic) classification of web texts by genre, register, or –
more generally speaking – “text type”, as well as topic area. In computational
linguistics, web corpora have become an established source of data for the
creation of language models, word embeddings, and for all types of machine
learning.
The twelfth Web as Corpus workshop (WAC-XII) looks at the past, present, and
future of web corpora given the fact that large web corpora are nowadays
provided mostly by a few major initiatives and/or companies, and the diversity
of the early years appears to have faded slightly. Also, we acknowledge the
fact that alternative sources of data (such as data from Twitter and similar
platforms) have emerged, some of them only available to large companies and
their affiliates, such as linguistic data from social media and other forms of
the deep web. At the same time, gathering interesting and/or relevant web data
(web crawling) is becoming an ever more intricate task as the nature of the
data offered on the web changes (for example the death of forums in favour of
more closed platforms).
We intend WAC-XII to be a platform for the discussion of some fundamental
issues in current web corpus construction. Some of the key issues that we see
for the future of web corpora are:
- Can the requirements of all of the aforementioned groups of users
(theoretical linguists, lexicographers, computational linguists, etc.) be met
by the same type of web corpora, or should web corpora be tailored to the
specific needs of different groups of users?
- How has the composition of the web (and subsequently that of web corpora)
changed? Are web data still as relevant and interesting as they were fifteen
years ago?
- What is the impact of changes in web data production (e.g., CMS and
microtexts published on more restricted platforms), and how can it be
addressed in the data collection process?
- Is there still an interest in fundamental research on the linguistic nature
and composition of the web?
- What is the level of quality of web data relative to the abovementioned
tasks to be performed with web data?
Organizers
Adrien Barbaresi (BBAW Berlin)
Felix Bildhauer (IDS Mannheim)
Roland Schäfer (Humboldt-Universität zu Berlin, SFB 1412)
Egon Stemle (Eurac Research)
Call for Papers:
The twelfth Web as Corpus workshop (WAC-XII) aims to unite (web) corpus
creators and all types of (web) corpus users from corpus/theoretical
linguistics, computational linguistics, cognitive science, etc. We invite
papers dealing with the fundamental questions mentioned above. In addition, we
invite papers dealing with the whole range of applied and fundamental topics
from both corpus/theoretical linguistic and computational linguistics which
have characterised WAC workshops, including but not limited to:
- Data selection and collection (discovery and/or crawling)
- Linguistic post-processing of web data
- Analysis of web corpora (assessment of the distribution of genres,
registers, topics, etc.)
- Comparison of web corpus data with other types of corpus data (traditional
corpora, linguistic data from social media, etc.)
- Case studies in corpus/theoretical or computational linguistics where web
data have been used
- Case studies in digital lexicography, for example using SketchEngine?
- Research specifically related to the validity of web data in
corpus/theoretical and computational linguistics
- Web data in psycholinguistic research and cognitive modelling
- Web corpora for language models and word embeddings
Format and submission
Like LREC 2020, WAC-XII asks for full papers from 4 pages to 8 pages (plus
more pages for references if needed) , which must strictly follow the LREC
stylesheet available on the LREC 2020 website. No distinction between long and
short papers will be made, but papers should have an appropriate length given
their content. Appropriate time slots for oral presentations will be allocated
according to the length of each paper. Papers must be submitted through START
[URL tba] and will undergo blind peer-review.
All papers will be published in the LREC 2020 proceedings.
Important dates
Submission deadline: Sunday, 16 February 2020 at 24:00 GMT-12
Notification of acceptance: Friday, 13 March 2020 at 22:00 GMT+1
Camera-ready manuscript due date: Friday, 27 March 2020 at 24:00 GMT-12
Workshop date: afternoon session of Saturday, 16 May 2020
------------------------------------------------------------------------------
***************************    LINGUIST List Support    ***************************
 The 2019 Fund Drive is under way! Please visit https://funddrive.linguistlist.org
  to find out how to donate and check how your university, country or discipline
     ranks in the fund drive challenges. Or go directly to the donation site:
               https://iufoundation.fundly.com/the-linguist-list-2019
                        Let's make this a short fund drive!
                Please feel free to share the link to our campaign:
                    https://funddrive.linguistlist.org/donate/
 
----------------------------------------------------------
LINGUIST List: Vol-31-190	
----------------------------------------------------------
    
    
More information about the LINGUIST
mailing list