[Corpora-List] Deadline Extension (Jan 30): The 9th Web as Corpus Workshop (WAC-9)

Roland Schäfer roland.schaefer at fu-berlin.de
Tue Jan 21 15:50:08 UTC 2014


The 9th Web as Corpus Workshop (WAC-9)
http://www.sigwac.org.uk/wiki/WAC9
Endorsed by the Special Interest Group of the ACL on Web as Corpus


DEADLINE EXTENDED UNTIL January 30, 2014!

Please notice that the deadline for camera-ready papers strictly
remains March 03, 2014.


WORKSHOP DESCRIPTION

The World Wide Web has become increasingly popular as a source of
linguistic data, not only within the NLP communities, but also with
theoretical linguists facing problems of data sparseness or data
diversity. Accordingly, web corpora continue to gain importance, given
their size and diversity in terms of genres/text types. However, the
field is still new, and a number of issues in web corpus construction
still needs much research (fundamental and applied), ranging from
questions of corpus design (e.g., corpus composition assessment,
sampling strategies and their relation to crawling algorithms, handling
of duplicated material) to more technical aspects (e.g., efficient
implementation of individual post-processing steps in document cleansing
and linguistic annotation, or large-scale parallelization to achieve
web-scale corpus construction). Similarly, the systematic evaluation of
web corpora, for example in the form of task-based comparisons to
traditional corpora, has only lately shifted into focus.

For almost a decade, the ACL SIGWAC, and especially the highly
successful Web as Corpus (WaC) workshops have served as a platform for
researchers interested in building and working with web-derived corpora.
Past workshops have been co-located with major conferences on
computational linguistics and/or corpus linguistics (such as EACL, LREC,
WWW, Corpus Linguistics). As part of the workshop, we will have a panel
discussion dedicated to the planning of a shared task for WAC-10 (2015),
including the nomination of organizers of the shared task. The tracks of
the shared task will focus on the quality of web corpus creation tools,
tools for linguistic annotation (at least lemmatization, possibly also
POS tagging, etc.), and the quality of web corpora themselves.


CALL FOR PAPERS with EXTENDED DEADLINE

As in previous years, the 9th Web as Corpus workshop (WAC-9) invites
original contributions pertaining to all aspects of web corpora,
including data collection, cleaning, duplicate removal, document
filtering, linguistic post-processing, and use of web corpora in
language technology and linguistics.

However, a major challenge in the construction of web corpora is the
question of the quality and the evaluation of both the software used in
the construction of web corpora as well as the corpora themselves.
Therefore, WAC-9 seeks to put special emphasis on these topics, and it
particularly encourages submissions addressing the following points:

* noise in web corpora: normalization and implications for linguistic
  annotation (lemmatization, POS tagging, parsing, etc.)
* task-based ("extrinsic") evaluation of web corpora, especially in
  comparison to traditional corpus resources and n-gram databases (Web
  1T 5-Grams, Google Books)
* missing meta data in web corpora: enriching web corpora with data by
  automatic classification with high accuracy
* sampling strategies/crawling algorithms and their effect on corpus
  composition/corpus quality
* non-destructive cleaning and normalization of web data


SUBMISSION DETAILS

Abstracts should be

* anonymous
* no longer than two pages (including figures and references)
* in PDF-format
* formatted according to the EACL stylesheet – templates for LaTeX and
  MS Word are available ​from:
  http://www.eacl2014.org/files/eacl-2014-styles.zip
* submitted via the ​START online submission system at:
  https://www.softconf.com/eacl2014/WaC9/
* submitted no later than 30 January 2014 (extended deadline)


ORGANIZING COMMITTEE

 Felix Bildhauer, Freie Universität Berlin
 Roland Schäfer, Freie Universität Berlin


PROGRAM COMMITTEE

Organizing comittee, plus

 Adrien Barbaresi, École Normale Supérieure de Lyon
 Silvia Bernardini, Università di Bologna
 Chris Biemann, Technische Universität Darmstadt
 Jesse Egbert, Northern Arizona University
 Stefan Evert, Friedrich-Alexander Universität Erlangen-Nürnberg
 Adriano Ferraresi, Università di Bologna
 William Fletcher, United States Naval Academy
 Dirk Goldhahn, Universität Leipzig
 Adam Kilgarriff, Lexical Computing Ltd.
 Anke Lüdeling, Humboldt-Universität zu Berlin
 Alexander Mehler, Goethe-Universität Frankfurt am Main
 Uwe Quasthoff, Universität Leipzig
 Paul Rayson, Lancaster University
 Sabine Schulte, im Walde, Universität Stuttgart
 Serge Sharoff, University of Leeds
 Egon Stemle, European Academy of Bolzano
 Stephen Wattam, Lancaster University
 Yannick Versley, Universität Heidelberg
 Torsten Zesch, Universität Darmstadt


IMPORTANT DATES

 11 November 2013: First Call for Workshop Papers
 12 December 2013: Second Call for Workshop Papers
 04 January 2014: Final Call for Workshop Papers
 30 January 2014: EXTENDED Workshop Paper Due Date
 20 February 2014: Notification of Acceptance
 03 March 2014: Camera-ready papers due
 26 April 2014: Workshop Date

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list