25.1203, Confs: Computational Linguistics, Text/Corpus Linguistics/Sweden

linguist at linguistlist.org linguist at linguistlist.org
Tue Mar 11 13:44:26 UTC 2014


LINGUIST List: Vol-25-1203. Tue Mar 11 2014. ISSN: 1069 - 4875.

Subject: 25.1203, Confs: Computational Linguistics, Text/Corpus Linguistics/Sweden

Fund Drive 2014
http://linguistlist.org/fund-drive/2014/

Moderators: Damir Cavar, Eastern Michigan U <damir at linguistlist.org>

Reviews: Monica Macaulay, U of Wisconsin Madison
Rajiv Rao, U of Wisconsin Madison
Joseph Salmons, U of Wisconsin Madison
Mateja Schuck, U of Wisconsin Madison
Anja Wanner, U of Wisconsin Madison
       <reviews at linguistlist.org>

Homepage: http://linguistlist.org

Do you want to donate to LINGUIST without spending an extra penny? Bookmark
the Amazon link for your country below; then use it whenever you buy from
Amazon!

USA: http://www.amazon.com/?_encoding=UTF8&tag=linguistlist-20
Britain: http://www.amazon.co.uk/?_encoding=UTF8&tag=linguistlist-21
Germany: http://www.amazon.de/?_encoding=UTF8&tag=linguistlistd-21
Japan: http://www.amazon.co.jp/?_encoding=UTF8&tag=linguistlist-22
Canada: http://www.amazon.ca/?_encoding=UTF8&tag=linguistlistc-20
France: http://www.amazon.fr/?_encoding=UTF8&tag=linguistlistf-21

For more information on the LINGUIST Amazon store please visit our
FAQ at http://linguistlist.org/amazon-faq.cfm.

Editor for this issue: Xiyan Wang <xiyan at linguistlist.org>
================================================================  


Date: Tue, 11 Mar 2014 09:42:56
From: Felix Bildhauer [felix.bildhauer at fu-berlin.de]
Subject: EACL 2014 Workshop on Web as Corpus

E-mail this message to a friend:
http://linguistlist.org/issues/emailmessage/verification.cfm?iss=25-1203.html&submissionid=28724994&topicid=4&msgnumber=1
 
EACL 2014 Workshop on Web as Corpus 
Short Title: WAC-9 

Date: 26-Apr-2014 - 26-Apr-2014 
Location: Gothenburg, Sweden 
Contact: Felix Bildhauer 
Contact Email: felix.bildhauer at fu-berlin.de 
Meeting URL: http://www.sigwac.org.uk/wiki/WAC9 

Linguistic Field(s): Computational Linguistics; Text/Corpus Linguistics 

Meeting Description: 

The 9th Web as Corpus Workshop (WAC-9)
Endorsed by the Special Interest Group of the ACL on Web as Corpus 
(http://www.sigwac.org.uk/)

The World Wide Web has become increasingly popular as a source of linguistic data, not only within the NLP communities, but also with theoretical linguists facing problems of data sparseness or data diversity. Accordingly, web corpora continue to gain importance, given their size and diversity in terms of genres/text types. However, the field is still new, and a number of 
issues in web corpus construction still needs much research (fundamental and applied), ranging from questions of corpus design (e.g., corpus composition assessment, sampling strategies and their relation to crawling algorithms, handling of duplicated material) to more technical aspects (e.g., efficient implementation of individual post-processing steps in document cleansing and linguistic annotation, or large-scale parallelization to achieve web-scale corpus construction). Similarly, the systematic evaluation of web corpora, for example in the form of task-based comparisons to traditional corpora, has only lately shifted into focus.

For almost a decade, the ACL SIGWAC, and especially the Web as Corpus (WAC) workshops have served as a platform for researchers interested in building and working with web-derived corpora. Past workshops have been co-located with major conferences on computational linguistics and/ or corpus linguistics (such as EACL, LREC, WWW, Corpus Linguistics). As part of the workshop, we will have a panel discussion dedicated to the planning of a shared task for WAC10 (2015), including the nomination of organizers of the shared task. The tracks of the shared task will focus on the quality of web corpus creation tools, tools for linguistic annotation (at least lemmatization, possibly also POS tagging, etc.), and the quality of web corpora themselves. 

Organizing Committee:

Felix Bildhauer, Freie Universität Berlin
Roland Schäfer, Freie Universität Berlin

Program Committee:

Organizing Committee, plus:

Adrien Barbaresi, École Normale Supérieure de Lyon
Silvia Bernardini, Università di Bologna
Chris Biemann, Technische Universität Darmstadt
Jesse Egbert, Northern Arizona University
Stefan Evert, Friedrich-Alexander Universität Erlangen-Nürnberg
Adriano Ferraresi, Università di Bologna
William Fletcher, United States Naval Academy
Dirk Goldhahn, Universität Leipzig
Adam Kilgarriff, Lexical Computing Ltd.
Anke Lüdeling, Humboldt-Universität zu Berlin
Alexander Mehler, Goethe-Universität Frankfurt am Main
Uwe Quasthoff, Universität Leipzig
Paul Rayson, Lancaster University
Serge Sharoff, University of Leeds
Sabine Schulte, im Walde, Universität Stuttgart
Egon Stemle, European Academy of Bolzano
Yannick Versley, Universität Heidelberg
Torsten Zesch, Universität Darmstadt
Stephen Wattam, Lancaster University 

Workshop Program: 

11:15–11:30
Welcome (Felix Bildhauer & Roland Schäfer)

11:30–12:00
Finding Viable Seed URLs for Web Corpora: A Scouting Approach and Comparative Study of Available Sources (Adrien Barbaresi)

12:00–12:30
Focused Web Corpus Crawling (Roland Schäfer, Adrien Barbaresi & Felix Bildhauer)

Lunch Break

14:00–14:30
Less Destructive Cleaning of Web Documents by Using Standoff Annotation (Maik Stührenberg)

14:30–15:00
Some Issues on the Normalization of a Corpus of Products Reviews in Portuguese (Magali Sanches Duran, Lucas Avanço, Sandra Aluísio, Thiago Pardo & Maria da Graça Volpe Nunes)

15:00–15:30
{bs,hr,sr}WaC - Web Corpora of Bosnian, Croatian and Serbian (Nikola Ljubešić & Filip Klubička)

Coffee Break

16:00–16:30
The PAISÀ Corpus of Italian Web Texts (Verena Lyding, Egon Stemle, Claudia Borghetti, Marco Brunello, Sara Castagnoli, Felice Dell’Orletta, Henrik Dittmann, Alessandro Lenci & Vito Pirrelli)

16:30–17:00
Internet Data in a Study of Language Change and a Program Helping to Work with Them (Varvara Magomedova, Natalia Slioussar & Maria Kholodilova)

17:00–18:00
Discussion








------------------------------------------------------------------------------
This Year the LINGUIST List hopes to raise $75,000. This money will go to help keep the List running by supporting all of our Student Editors for the coming year.

See below for donation instructions, and don't forget to check out Fund Drive 2014 site!

http://linguistlist.org/fund-drive/2014/

There are many ways to donate to LINGUIST!

You can donate right now using our secure credit card form at https://linguistlist.org/donation/donate/donate1.cfm

Alternatively you can also pledge right now and pay later. To do so, go to: https://linguistlist.org/donation/pledge/pledge1.cfm

For all information on donating and pledging, including information on how to donate by check, money order, PayPal or wire transfer, please visit: http://linguistlist.org/donation/

The LINGUIST List is under the umbrella of Eastern Michigan University and as such can receive donations through the EMU Foundation, which is a registered 501(c) Non Profit organization. Our Federal Tax number is 38-6005986. These donations can be offset against your federal and sometimes your state tax return (U.S. tax payers only). For more information visit the IRS Web-Site, or contact your financial advisor.

Many companies also offer a gift matching program, such that they will match any gift you make to a non-profit organization. Normally this entails your contacting your human resources department and sending us a form that the EMU Foundation fills in and returns to your employer. This is generally a simple administrative procedure that doubles the value of your gift to LINGUIST, without costing you an extra penny. Please take a moment to check if your company operates such a program.

Thank you very much for your support of LINGUIST!
 


----------------------------------------------------------
LINGUIST List: Vol-25-1203	
----------------------------------------------------------



More information about the LINGUIST mailing list