[Corpora-List] DEADLINE EXTENSION: Web Data as a Challenge for Theoretical Linguistics and Corpus Design (Marburg, March 5-7, 2014)
Roland Schäfer
roland.schaefer at fu-berlin.de
Mon Jul 29 11:30:33 UTC 2013
*DEADLINE EXTENSION: August 12, 2013*
Web Data as a Challenge for Theoretical Linguistics and Corpus Design
Workshop at the 36th Annual Conference of the German Linguistic Society
(March 5-7, 2014 at Marburg University, Marburg/Lahn, Germany)
Website: http://hpsg.fu-berlin.de/cow/dgfs2014/
EasyChair: https://www.easychair.org/conferences/?conf=webtl2014
Organizers:
Felix Bildhauer (Freie Universität Berlin/SFB632)
Roland Schäfer (Freie Universität Berlin)
Program Committee:
Chris Biemann
Stefan Evert
Matthias Hüning
Anke Lüdeling
Alexander Mehler
Uwe Quasthoff
Amir Zeldes
Torsten Zesch
Arne Zeschel
Important Dates:
First call for papers: Monday, June 17, 2013
Second call for papers: Friday, July 19, 2013
EXTENDED Submission Deadline: Monday, August 12, 2013 @ 23:59 GMT
Workshop: March 5-7, 2014
*Aim of the Workshop and Call for Papers*
The huge amounts of linguistic data on the web offer exciting new
possibilities in empirically based theoretical linguistics. Web-derived
linguistic resources can contain greater amounts of variation as well as
non-standard grammar and writing compared to traditionally compiled
corpora. Also, whole new registers and genres have been described to
emerge on the web. Like spoken language - although clearly distinct from
it - the language found on the web can thus challenge linguistic
theories which are based mainly on standard written language as well as
the categories assumed within these theories. At the same time, such
non-standard features make the data harder to process for computational
linguists, and additional care is required in making the decision of
labeling material as "noise", because it might be considered valuable
data by some linguists.
This workshop aims to bring together researchers working in Theoretical
Linguistics and Corpus Linguistics with those who create resources from
web data. The primary question of the workshop is: Which new linguistic
insights can we derive from web data? Secondarily, we ask how web data
is (and how it should be) processed to produce easily accessible
high-quality resources and thus facilitate this kind of innovative
linguistic research.
Possible subjects for talks include (but are by no means restricted to):
- theoretically motivated empirical studies of linguistic phenomena in
web data,
- work on problems with established linguistic categories specific to
certain types of web data (problems with traditional part-of-speech
classification, syntactic categories, register and genre
classification, etc.),
- problems of working with web corpora from the user's perspective in
concrete studies (low quality of: tokenization, POS tagging, named
entity recognition, etc.; availability and lack of meta data),
- assessments and improvements of the quality of available and newly
designed tools and models to process or classify web data,
- approaches to normalization of web data and evaluations of the
acceptability of such normalizations from a linguistic perspective,
- sampling of web data (e.g., stratified vs. randomly compiled corpora,
linguistic web characterization)
We invite submissions for 30 minute talks (20 minutes plus 10 minutes of
discussion) about completed or ongoing original research in which web
data is used or which is about the creation and/or evaluation of web
data resources. The scope of the workshop is neither restricted to
resources of a specific size or nature nor to any specific language(s).
Submitted abstracts will be reviewed anonymously by at least two
reviewers. We hope to offer authors of accepted talks the opportunity to
publish an extended version of their talk in a special issue of a
peer-reviewed corpus linguistics journal.
*Submission Details*
- Submitted abstracts for 30 minute presentations (20 minutes plus 10
minutes discussion) should be between 800 and 1,000 words long
(excluding references and tables).
- Submissions must be anonymous. Please take care in removing
information from the file which could reveal your identity.
- The language of all abstracts and the workshop is English.
- The only accepted file format for submission is PDF.
- Submission must be made via EasyChair (WEBTL-2014):
https://www.easychair.org/conferences/?conf=webtl2014
- Authors of accepted papers will be asked to provide a shorter 200 word
abstract to be printed in the conference program as an MS Word or
OpenDocument file.
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list