[Corpora-List] DEADLINE EXTENSION: Web Data as a Challenge for Theoretical Linguistics and Corpus Design (Marburg, March 5-7, 2014)

Mon Jul 29 11:30:33 UTC 2013

*DEADLINE EXTENSION: August 12, 2013*

Web Data as a Challenge for Theoretical Linguistics and Corpus Design

Workshop at the 36th Annual Conference of the German Linguistic Society
(March 5-7, 2014 at Marburg University, Marburg/Lahn, Germany)

Website:   http://hpsg.fu-berlin.de/cow/dgfs2014/
EasyChair: https://www.easychair.org/conferences/?conf=webtl2014

Organizers:
  Felix Bildhauer (Freie Universität Berlin/SFB632)
  Roland Schäfer (Freie Universität Berlin)

Program Committee:
  Chris Biemann
  Stefan Evert
  Matthias Hüning
  Anke Lüdeling
  Alexander Mehler
  Uwe Quasthoff
  Amir Zeldes
  Torsten Zesch
  Arne Zeschel

Important Dates:
  First call for papers:         Monday, June 17, 2013
  Second call for papers:        Friday, July 19, 2013
  EXTENDED Submission Deadline:  Monday, August 12, 2013 @ 23:59 GMT
  Workshop:                      March 5-7, 2014

*Aim of the Workshop and Call for Papers*

The huge amounts of linguistic data on the web offer exciting new
possibilities in empirically based theoretical linguistics. Web-derived
linguistic resources can contain greater amounts of variation as well as
non-standard grammar and writing compared to traditionally compiled
corpora. Also, whole new registers and genres have been described to
emerge on the web. Like spoken language - although clearly distinct from
it - the language found on the web can thus challenge linguistic
theories which are based mainly on standard written language as well as
the categories assumed within these theories. At the same time, such
non-standard features make the data harder to process for computational
linguists, and additional care is required in making the decision of
labeling material as "noise", because it might be considered valuable
data by some linguists.

This workshop aims to bring together researchers working in Theoretical
Linguistics and Corpus Linguistics with those who create resources from
web data. The primary question of the workshop is: Which new linguistic
insights can we derive from web data? Secondarily, we ask how web data
is (and how it should be) processed to produce easily accessible
high-quality resources and thus facilitate this kind of innovative
linguistic research.

Possible subjects for talks include (but are by no means restricted to):

- theoretically motivated empirical studies of linguistic phenomena in
  web data,
- work on problems with established linguistic categories specific to
  certain types of web data (problems with traditional part-of-speech
  classification, syntactic categories, register and genre
  classification, etc.),
- problems of working with web corpora from the user's perspective in
  concrete studies (low quality of: tokenization, POS tagging, named
  entity recognition, etc.; availability and lack of meta data),
- assessments and improvements of the quality of available and newly
  designed tools and models to process or classify web data,
- approaches to normalization of web data and evaluations of the
  acceptability of such normalizations from a linguistic perspective,
- sampling of web data (e.g., stratified vs. randomly compiled corpora,
  linguistic web characterization)

We invite submissions for 30 minute talks (20 minutes plus 10 minutes of
discussion) about completed or ongoing original research in which web
data is used or which is about the creation and/or evaluation of web
data resources. The scope of the workshop is neither restricted to
resources of a specific size or nature nor to any specific language(s).
Submitted abstracts will be reviewed anonymously by at least two
reviewers. We hope to offer authors of accepted talks the opportunity to
publish an extended version of their talk in a special issue of a
peer-reviewed corpus linguistics journal.

*Submission Details*

- Submitted abstracts for 30 minute presentations (20 minutes plus 10
  minutes discussion) should be between 800 and 1,000 words long
  (excluding references and tables).
- Submissions must be anonymous. Please take care in removing
  information from the file which could reveal your identity.
- The language of all abstracts and the workshop is English.
- The only accepted file format for submission is PDF.
- Submission must be made via EasyChair (WEBTL-2014):
  https://www.easychair.org/conferences/?conf=webtl2014
- Authors of accepted papers will be asked to provide a shorter 200 word
  abstract to be printed in the conference program as an MS Word or
  OpenDocument file.

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora