[Corpora-List] Web as corpus - WAC3 & Cleaneval: Call for participation

Cédrick Fairon cedrick.fairon at uclouvain.be
Wed Jul 25 12:21:38 UTC 2007


Call for participation
------------------------------------------------------------------------ 
------
* 3rd Web as Corpus Workshop (WAC3)
incorporating Cleaneval
An ACL-SIGWAC event*

Sept. 15-16, 2007
University of Louvain, Louvain-la-Neuve, Belgium

The program is now available online: http://cental.fltr.ucl.ac.be/wac3
------------------------------------------------------------------------ 
------

      WAC3

More and more people are using Web data for linguistic and NLP
research.  The workshop provides a venue for exploring how we can use
it effectively and what we will find if we do (see the program).

     Cleaneval

Anyone using web data needs to clean it, to get rid of unwanted
material including, for example, HTML markup, navigation bars,
advertisements. To date there has been no sharing of resources or
expertise and the cleaning has often been done minimally. Cleaneval is
an exercise to promote sharing and to improve our understanding of the
issues. More info at Cleaneval <http://cleaneval.sigwac.org.uk>.
Results of the Cleaneval competition will be presented and discussed
during the workshop (see the program).

     Invited speaker : Kevin Scannell

Kevin Scannell, of Saint Louis Univ., Missouri, USA, has been working
with scholars of a range of smaller languages to develop web corpora
for those languages : website
http://borel.slu.edu/crubadan/stadas.html currently lists 135
corpora/languages.

     Venue

Université catholique de Louvain
http://www.uclouvain.be/, in the elegant new city of
Louvain-la-Neuve http://www.eupedia.com/belgium/louvain-la-neuve.shtml
(Belgium). Large computer rooms will be available for demo sessions.

     Registration

You will find the registration form on the conference web site.
"Early bird fees" apply until August 17, 2007.
Student: 100 euros / 125 euros (after August 17, 2007)
Others: 125 euros / 150 euros (after August 17, 2007)

     Previous WAC workshops

More info at WAC1
http://sslmit.unibo.it/%7Ebaroni/web_as_corpus_cl05.html at Corpus
Linguistics conference, Birmingham, UK, July 2005.

More info at WAC2
http://sslmit.unibo.it/%7Ebaroni/web_as_corpus_eacl06.html at EACL,
Trento, Italy, April 2006.

Points of contact

         Worskshop Co-chairs

Cédrick Fairon, UCLouvain, Cental, fairon at tedm.ucl.ac.be
Prof. Gilles-Maurice de Schryver, Universiteit Gent

         Cleaneval committee

Marco Baroni, U Trento; Secretary, SIGWAC
Tony Hartley, U Leeds
Adam Kilgarriff, Lexical Computing Ltd; Chair, SIGWAC
Serge Sharoff, U Leeds

         Local organisation team

Bernadette Dehottay, UCLouvain, Cental, dehottay at tedm.ucl.ac.be
Julia Medori, CENTAL, UCLouvain
Laurent Kevers, CENTAL, UCLouvain
Hubert Naets, CENTAL, UCLouvain
Isabelle Lecroart, CENTAL, UCLouvain
Claude Devis, CENTAL, UCLouvain

Contact us :
Bernadette Dehottay
Université catholique de Louvain
Centre for Natural Language Processing (CENTAL)
Place Blaise Pascal, 1
1348 Louvain-la-Neuve
Tel. +32 10 47 37 88
Fax. +32 10 47 26 06
dehottay at tedm.ucl.ac.be




Cédrick Fairon
cedrick.fairon at uclouvain.be

Directeur du CENTAL
Centre de traitement automatique du langage
Université catholique de Louvain
Place Blaise Pascal, 1
1348 Louvain-la-Neuve
Belgique
tel: +32 10 47 37 88
fax: +32 10 47 26 06

http://cental.fltr.ucl.ac.be
http://glossa.fltr.ucl.ac.be




_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list