Conf: Web as corpus - WAC3 & Cleaneval

Thierry Hamon thierry.hamon at LIPN.UNIV-PARIS13.FR
Wed Aug 1 09:38:52 UTC 2007

Date: Wed, 25 Jul 2007 14:28:57 +0200
From: Cédrick Fairon <cedrick.fairon at>
Message-Id: <7AE3661F-89A1-49DE-BF6F-ED8B1EB59021 at>

Call for participation
* 3rd Web as Corpus Workshop (WAC3)
incorporating Cleaneval
An ACL-SIGWAC event*

Sept. 15-16, 2007
University of Louvain, Louvain-la-Neuve, Belgium

The program is now available online:


More and more people are using Web data for linguistic and NLP
research.  The workshop provides a venue for exploring how we can use
it effectively and what we will find if we do (see the program).


Anyone using web data needs to clean it, to get rid of unwanted
material including, for example, HTML markup, navigation bars,
advertisements. To date there has been no sharing of resources or
expertise and the cleaning has often been done minimally. Cleaneval is
an exercise to promote sharing and to improve our understanding of the
issues. More info at Cleaneval <>.
Results of the Cleaneval competition will be presented and discussed
during the workshop (see the program).

     Invited speaker : Kevin Scannell

Kevin Scannell, of Saint Louis Univ., Missouri, USA, has been working
with scholars of a range of smaller languages to develop web corpora
for those languages : website currently lists 135


Université catholique de Louvain, in the elegant new city of
(Belgium). Large computer rooms will be available for demo sessions.


You will find the registration form on the conference web site.
"Early bird fees" apply until August 17, 2007.
Student: 100 euros / 125 euros (after August 17, 2007)
Others: 125 euros / 150 euros (after August 17, 2007)

     Previous WAC workshops

More info at WAC1 at Corpus
Linguistics conference, Birmingham, UK, July 2005.

More info at WAC2 at EACL,
Trento, Italy, April 2006.

Points of contact

         Worskshop Co-chairs

Cédrick Fairon, UCLouvain, Cental, fairon at
Prof. Gilles-Maurice de Schryver, Universiteit Gent

         Cleaneval committee

Marco Baroni, U Trento; Secretary, SIGWAC
Tony Hartley, U Leeds
Adam Kilgarriff, Lexical Computing Ltd; Chair, SIGWAC
Serge Sharoff, U Leeds

         Local organisation team

Bernadette Dehottay, UCLouvain, Cental, dehottay at
Julia Medori, CENTAL, UCLouvain
Laurent Kevers, CENTAL, UCLouvain
Hubert Naets, CENTAL, UCLouvain
Isabelle Lecroart, CENTAL, UCLouvain
Claude Devis, CENTAL, UCLouvain

Contact us :
Bernadette Dehottay
Université catholique de Louvain
Centre for Natural Language Processing (CENTAL)
Place Blaise Pascal, 1
1348 Louvain-la-Neuve
Tel. +32 10 47 37 88
Fax. +32 10 47 26 06
dehottay at

Cédrick Fairon
cedrick.fairon at

Directeur du CENTAL
Centre de traitement automatique du langage
Université catholique de Louvain
Place Blaise Pascal, 1
1348 Louvain-la-Neuve
tel: +32 10 47 37 88
fax: +32 10 47 26 06

Message diffuse par la liste Langage Naturel <LN at>
Informations, abonnement :
English version       : 
Archives                 :

La liste LN est parrainee par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhesion  :

More information about the Ln mailing list