24.2464, Confs: Computational Linguistics, Text/Corpus Linguistics/UK

linguist at linguistlist.org linguist at linguistlist.org
Mon Jun 17 17:35:14 UTC 2013


LINGUIST List: Vol-24-2464. Mon Jun 17 2013. ISSN: 1069 - 4875.

Subject: 24.2464, Confs: Computational Linguistics, Text/Corpus Linguistics/UK

Moderator: Damir Cavar, Eastern Michigan U <damir at linguistlist.org>

Reviews: Veronika Drake, U of Wisconsin Madison
Monica Macaulay, U of Wisconsin Madison
Rajiv Rao, U of Wisconsin Madison
Joseph Salmons, U of Wisconsin Madison
Mateja Schuck, U of Wisconsin Madison
Anja Wanner, U of Wisconsin Madison
       <reviews at linguistlist.org>

Homepage: http://linguistlist.org

Do you want to donate to LINGUIST without spending an extra penny? Bookmark
the Amazon link for your country below; then use it whenever you buy from
Amazon!

USA: http://www.amazon.com/?_encoding=UTF8&tag=linguistlist-20
Britain: http://www.amazon.co.uk/?_encoding=UTF8&tag=linguistlist-21
Germany: http://www.amazon.de/?_encoding=UTF8&tag=linguistlistd-21
Japan: http://www.amazon.co.jp/?_encoding=UTF8&tag=linguistlist-22
Canada: http://www.amazon.ca/?_encoding=UTF8&tag=linguistlistc-20
France: http://www.amazon.fr/?_encoding=UTF8&tag=linguistlistf-21

For more information on the LINGUIST Amazon store please visit our
FAQ at http://linguistlist.org/amazon-faq.cfm.

Editor for this issue: Alison Zaharee <alison at linguistlist.org>
================================================================  


Date: Mon, 17 Jun 2013 13:34:33
From: Stefan Evert [stefan.evert at fau.de]
Subject: 8th Web as Corpus Workshop

E-mail this message to a friend:
http://linguistlist.org/issues/emailmessage/verification.cfm?iss=24-2464.html&submissionid=15596199&topicid=4&msgnumber=1
 
8th Web as Corpus Workshop 
Short Title: WAC8 

Date: 22-Jul-2013 - 22-Jul-2013 
Location: Lancaster, United Kingdom 
Contact: Stefan Evert 
Contact Email: stefan.evert at fau.de 
Meeting URL: http://sigwac.org.uk/wiki/WAC8 

Linguistic Field(s): Computational Linguistics; Text/Corpus Linguistics 

Meeting Description: 

8th Web as Corpus Workshop (WAC-8)
Endorsed by ACL SIGWAC
Hosted by the Corpus Linguistics 2013 Conference
Monday, 22 July 2013 (Lancaster, UK)

Web corpora and other Web-derived data have become a gold mine for corpus linguistics and natural language processing. The Web is an easy source of unprecedented amounts of linguistic data from a broad range of registers and text types. However, a collection of Web pages is not immediately suitable for exploration in the same way a traditional corpus is.
 
Since the first Web as Corpus Workshop organised at the Corpus Linguistics 2005 Conference, a highly successful series of yearly Web as Corpus workshops provides a venue for interested researchers to meet, share ideas and discuss the problems and possibilities of compiling and using Web corpora. After a stronger focus on application-oriented natural language processing and Web technology in recent years - with workshops taking place at NAACL-HLT 2010, 2011 and WWW 2012 - the 8th Web as Corpus Workshop returns to its roots in the corpus linguistics community.

Accordingly, the leading theme of this workshop is the application of Web data in language research, including linguistic evaluation of Web-derived corpora as well as strategies and tools for high-quality automatic annotation of Web text. The workshop brings together presentations on all aspects of building, using and evaluating Web corpora, with a particular focus on the following topics:
 
- Applications of Web corpora and other Web-derived data sets for language research
- Automatic linguistic annotation of Web data such as tokenisation, part-of-speech tagging, lemmatisation and semantic tagging (the accuracy of currently available off-the-shelf tools is still unsatisfactory for many types of Web data)
- Critical exploration of the characteristics of Web data from a linguistic perspective and its applicability to language research
- Presentation of Web corpus collection projects or software tools required for some part of this process (crawling, filtering, de-duplication, language identification, indexing, etc.) 

Call for Participation:

Note that registration for the workshop and the main conference closes on Sunday, June 30. 
Registration URL: http://ucrel.lancs.ac.uk/cl2013/register.php

Further details can be found on the workshop homepage at http://sigwac.org.uk/wiki/WAC8.

Programme:

09:00
Akshay Minocha, Siva Reddy and Adam Kilgarriff
Feed Corpus: An Ever Growing Up-to-date Corpus

09:30
Stephen Wattam, Paul Rayson and Damon Berridge
LWAC: Longitudinal Web-as-Corpus Sampling

10:00
Roland Schäfer, Adrien Barbaresi and Felix Bildhauer
The Good, the Bad, and the Hazy: Design Decisions in Web Corpus Construction

10:30
Jesse Egbert and Douglas Biber
Developing a User-based Method of Web Register Classification

11:00 - 11:30
Tea Break 	

11:30
Adam Kilgarriff and Vít Suchomel
Web Spam

12:00
David Lutz, Parry Cadwallader and Mats Rooth
A Web Application for Filtering and Annotating Web Speech Data

12:30
Sarah Schulz, Verena Lyding and Lionel Nicolas
STirWaC - Compiling a Diverse Corpus Based on Texts from the Web for South Tyrolean German

13:00 - 14:00
Lunch 	

14:00
Alexander Piperski, Vladimir Belikov, Nikolay Kopylov, Vladimir Selegey and Serge Sharoff
Big and Diverse is Beautiful: A Large Corpus of Russian to Study Linguistic Variation

14:30
Adriano Ferraresi and Silvia Bernardini
The Academic Web-as-Corpus

15:00
Silke Scheible and Sabine Schulte Im Walde
A Compact but Linguistically Detailed Database for German Verb Subcategorisation Relying on Dependency Parses from a Web Corpus

15:30 - 16:00
Tea Break 	

16:00
Andrew Brindle
Thug Breaks Man's Jaw: A Corpus Analysis of Responses to Interpersonal Street Violence

16:30
Colleen Crangle
A Web-based Model of Semantic Relatedness and the Analysis of Electroencephalographic (EEG) Data

17:00
Discussion and wrap-up

18:00
Pub

Organizing Committee:
 
Stefan Evert, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU)
Egon Stemle, European Academy of Bozen/Bolzano (EURAC)
Paul Rayson, Lancaster University








----------------------------------------------------------
LINGUIST List: Vol-24-2464	
----------------------------------------------------------



More information about the LINGUIST mailing list