[Corpora-List] Final Call for Papers: 4th Web as Corpus Workshop (LREC 2008, Marrakech)

Stefan Evert stefan.evert at uos.de
Fri Feb 22 16:45:34 UTC 2008


===== Second Call for Papers =====

The 4th Web as Corpus workshop: Can we beat Google?

Marrakech, Morocco (post-LREC workshop)
1 June 2008

http://webascorpus.sf.net/WAC4/

==================================

Submission deadline: 29 February 2008

PAPER SUBMISSION: http://www.easychair.org/conferences/?conf=wac4

==================================

DESCRIPTION

Commercial Web search engines offer fast search on huge amounts of  
text, combined with increasingly clever ranking and data analysis  
algorithms, but their content-centric services do not cater to the  
needs of the computational linguistics and NLP communities.  The  
leading theme of this workshop, the fourth in a row of highly  
successful Web as Corpus meetings, is to find out how to combine the  
power and scalability of modern search engine technology with  
sophisticated linguistic annotation and query processing.

We invite papers on various topics concerning the use of Web  
resources for corpus research and NLP applications, including (but  
not limited to) the following:

    * linguistic Web crawler technology and Web corpus collection  
projects
    * applications of Web-derived corpora and other kinds of Web data
    * how far does the "easy way" get you? (using search engines, or  
Google's n-gram lists; we are particularly interested in a critical  
discussion of the usefulness and limitations of such approaches)
    * methods and tools for "cleaning" Web pages to turn them into a  
corpus (contributors to this topic will be encouraged to participate  
in the second CLEANEVAL competition to be held in 2009)
    * automatic linguistic annotation of Web data: tokenisation, POS  
tagging, lemmatisation, semantic tagging, etc. (established tools  
often perform very poorly on Web data)
    * search engine architectures for linguists: bringing linguistics  
to commercial search engines, or high-performance search technology  
to linguistics?
    * search engine-related topics such as result ranking (e.g. how  
to identify "typical" uses rather than returning 50 very similar  
matches on the first page)
    * duplicate detection, interactive query refinement, etc.
    * reviews and clever uses of search engine APIs (Google, Yahoo,  
Altavista, and in particular Microsoft's current generous LiveSearch  
API)

This workshop is endorsed by the Special Interest Group on the Web as  
Corpus (SIGWAC) of the Association for Computational Linguistics (ACL).

SUBMISSION INFORMATION

Authors are invited to submit full papers on original, unpublished  
work in the topic area of this workshop.  Submissions should follow  
the format of LREC proceedings and should not exceed eight (8) pages,  
including references.  We strongly recommend the use of LREC LaTeX or  
Microsoft Word style files tailored for this year's conference.

Submissions are managed via EasyChair.org.  In order to submit a  
paper, go to:

	http://www.easychair.org/conferences/?conf=wac4

and login (or register an account with EasyChair if you don't have  
one yet). After logging in, click 'New Submission' and fill in the  
standard fields.

PROGRAMME COMMITTEE

Silvia Bernardini, U of Bologna, Italy
Massimiliano Ciaramita, Yahoo! Research Barcelona, Spain
Jesse de Does, INL, Netherlands
Katrien Depuydt, INL, Netherlands
Stefan Evert, U of Osnabrück, Germany
Cédrick Fairon, UCLouvain, Belgium
William Fletcher, U.S. Naval Academy, USA
Gregory Grefenstette, Commissariat à l'Énergie Atomique, France
Péter Halácsy, Budapest U of Technology and Economics, Hungary
Katja Hofmann, U of Amsterdam, Netherlands
Adam Kilgarriff, Lexical Computing Ltd, UK
Igor Leturia, Elhuyar Fundazioa, Basque Country, Spain
Phil Resnik, U of Maryland, College Park, USA
Kevin Scannell, Saint Louis U, USA
Gilles-Maurice de Schryver, U Gent, Belgium
Klaus Schulz, LMU München, Germany
Serge Sharoff, U of Leeds, UK
Eros Zanchetta, U of Bologna, Italy

ORGANISING COMMITTEE

Stefan Evert, University of Osnabrück
Adam Kilgarriff, Lexical Computing
Serge Sharoff, University of Leeds


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list