24.5056, Calls: Computational Linguistics, Text/Corpus Linguistics/Sweden

linguist at linguistlist.org linguist at linguistlist.org
Wed Dec 11 16:12:43 UTC 2013


LINGUIST List: Vol-24-5056. Wed Dec 11 2013. ISSN: 1069 - 4875.

Subject: 24.5056, Calls: Computational Linguistics, Text/Corpus Linguistics/Sweden

Moderator: Damir Cavar, Eastern Michigan U <damir at linguistlist.org>

Reviews: 
Monica Macaulay, U of Wisconsin Madison
Rajiv Rao, U of Wisconsin Madison
Joseph Salmons, U of Wisconsin Madison
Mateja Schuck, U of Wisconsin Madison
Anja Wanner, U of Wisconsin Madison
       <reviews at linguistlist.org>

Homepage: http://linguistlist.org

Do you want to donate to LINGUIST without spending an extra penny? Bookmark
the Amazon link for your country below; then use it whenever you buy from
Amazon!

USA: http://www.amazon.com/?_encoding=UTF8&tag=linguistlist-20
Britain: http://www.amazon.co.uk/?_encoding=UTF8&tag=linguistlist-21
Germany: http://www.amazon.de/?_encoding=UTF8&tag=linguistlistd-21
Japan: http://www.amazon.co.jp/?_encoding=UTF8&tag=linguistlist-22
Canada: http://www.amazon.ca/?_encoding=UTF8&tag=linguistlistc-20
France: http://www.amazon.fr/?_encoding=UTF8&tag=linguistlistf-21

For more information on the LINGUIST Amazon store please visit our
FAQ at http://linguistlist.org/amazon-faq.cfm.

Editor for this issue: Bryn Hauk <bryn at linguistlist.org>
================================================================  


Date: Wed, 11 Dec 2013 11:11:46
From: Felix Bildhauer [felix.bildhauer at fu-berlin.de]
Subject: EACL 2014 Workshop on Web as Corpus

E-mail this message to a friend:
http://linguistlist.org/issues/emailmessage/verification.cfm?iss=24-5056.html&submissionid=24240646&topicid=3&msgnumber=1
 
Full Title: EACL 2014 Workshop on Web as Corpus 
Short Title: WAC-9 

Date: 26-Apr-2014 - 26-Apr-2014
Location: Gothenburg, Sweden 
Contact Person: Felix Bildhauer
Meeting Email: felix.bildhauer at fu-berlin.de
Web Site: http://www.sigwac.org.uk/wiki/WAC9 

Linguistic Field(s): Computational Linguistics; Text/Corpus Linguistics 

Call Deadline: 23-Jan-2014 

Meeting Description:

The 9th Web as Corpus Workshop (WAC-9)
Endorsed by the Special Interest Group of the ACL on Web as Corpus (http://www.sigwac.org.uk/)

The World Wide Web has become increasingly popular as a source of linguistic data, not only within the NLP communities, but also with theoretical linguists facing problems of data sparseness or data diversity. Accordingly, web corpora continue to gain importance, given their size and diversity in terms of genres/text types. However, the field is still new, and a number of issues in web corpus construction still needs much research (fundamental and applied), ranging from questions of corpus design (e.g., corpus composition assessment, sampling strategies and their relation to crawling algorithms, handling of duplicated material) to more technical aspects (e.g., efficient implementation of individual post-processing steps in document cleansing and linguistic annotation, or large-scale parallelization to achieve web-scale corpus construction). Similarly, the systematic evaluation of web corpora, for example in the form of task-based comparisons to traditional corpora, has only lately shifted into focus.

For almost a decade, the ACL SIGWAC, and especially the Web as Corpus (WAC) workshops have served as a platform for researchers interested in building and working with web-derived corpora. Past workshops have been co-located with major conferences on computational linguistics and/ or corpus linguistics (such as EACL, LREC, WWW, Corpus Linguistics). As part of the workshop, we will have a panel discussion dedicated to the planning of a shared task for WAC10 (2015), including the nomination of organizers of the shared task. The tracks of the shared task will focus on the quality of web corpus creation tools, tools for linguistic annotation (at least lemmatization, possibly also POS tagging, etc.), and the quality of web corpora themselves. 

Organizing Committee:

Felix Bildhauer, Freie Universität Berlin
Roland Schäfer, Freie Universität Berlin

Program Committee:

Organizing Committee, plus:

Adrien Barbaresi, École Normale Supérieure de Lyon
Silvia Bernardini, Università di Bologna
Chris Biemann, Technische Universität Darmstadt
Jesse Egbert, Northern Arizona University
Stefan Evert, Friedrich-Alexander Universität Erlangen-Nürnberg
Adriano Ferraresi, Università di Bologna
William Fletcher, United States Naval Academy
Dirk Goldhahn, Universität Leipzig
Adam Kilgarriff, Lexical Computing Ltd.
Anke Lüdeling, Humboldt-Universität zu Berlin
Alexander Mehler, Goethe-Universität Frankfurt am Main
Uwe Quasthoff, Universität Leipzig
Paul Rayson, Lancaster University
Serge Sharoff, University of Leeds
Sabine Schulte, im Walde, Universität Stuttgart
Egon Stemle, European Academy of Bolzano
Yannick Versley, Universität Heidelberg
Torsten Zesch, Universität Darmstadt
Stephen Wattam, Lancaster University

2nd Call for Papers:

EACL 2014 Workshop on Web as Corpus (WAC-9)

As in previous years, the 9th Web as Corpus workshop (WAC-9) invites original contributions pertaining to all aspects of web corpora, including data collection, cleaning, duplicate removal, document filtering, linguistic post-processing, and use of web corpora in language technology and linguistics.

However, a major challenge in the construction of web corpora is the question of the quality and the evaluation of both the software used in the construction of web corpora as well as the corpora themselves. Therefore, WAC-9 seeks to put special emphasis on these topics, and it particularly encourages submissions addressing the following points:

- Noise in web corpora: Normalization and implications for linguistic annotation (lemmatization, POS tagging, parsing, etc.)
- Task-based (''extrinsic'') evaluation of web corpora, especially in comparison to traditional corpus resources and n-gram databases (Web 1T 5-Grams, Google Books)
- Missing metadata in web corpora: Enriching web corpora with data by automatic classification with high accuracy
- Sampling strategies / crawling algorithms and their effect on corpus composition / corpus quality
- Non-destructive cleaning and normalization of web data (Currently available web corpora have usually undergone radical cleaning procedures in order to produce ''high-quality'' data. At least for some uses of the data, aggressive and sometimes arbitrary removal of material in the form of whole documents or parts thereof can be problematic. The same is true for aggressive normalization of the data. To meet such problems, ways of cleaning and normalizing the data transparently, i.e., preserving the non-normalized forms, should be discussed.)

Submission Details:

Abstracts should be:

- Anonymous
- No longer than two pages (including figures and references)
- In PDF format
- Formatted according to the EACL stylesheet (templates for LaTeX and MS Word are available from: http://www.eacl2014.org/files/eacl-2014-styles.zip)
- Submitted via the START online submission system at: https://www.softconf.com/eacl2014/WaC9/
- Submitted no later than 23 January 2014

Important Dates:

11 November 2013: First call for workshop papers
12 December 2013: Second call for workshop papers
4 January 2014: Final call for workshop papers
23 January 2014: Workshop paper due date
20 February 2014: Notification of acceptance
3 March 2014: Camera-ready papers due
26 April 2014: Workshop date







----------------------------------------------------------
LINGUIST List: Vol-24-5056	
----------------------------------------------------------



More information about the LINGUIST mailing list