[Corpora-List] Web as corpus - WAC3 & Cleaneval: Call for participation
Cédrick Fairon
cedrick.fairon at uclouvain.be
Wed Jul 25 12:21:38 UTC 2007
Call for participation
------------------------------------------------------------------------
------
* 3rd Web as Corpus Workshop (WAC3)
incorporating Cleaneval
An ACL-SIGWAC event*
Sept. 15-16, 2007
University of Louvain, Louvain-la-Neuve, Belgium
The program is now available online: http://cental.fltr.ucl.ac.be/wac3
------------------------------------------------------------------------
------
WAC3
More and more people are using Web data for linguistic and NLP
research. The workshop provides a venue for exploring how we can use
it effectively and what we will find if we do (see the program).
Cleaneval
Anyone using web data needs to clean it, to get rid of unwanted
material including, for example, HTML markup, navigation bars,
advertisements. To date there has been no sharing of resources or
expertise and the cleaning has often been done minimally. Cleaneval is
an exercise to promote sharing and to improve our understanding of the
issues. More info at Cleaneval <http://cleaneval.sigwac.org.uk>.
Results of the Cleaneval competition will be presented and discussed
during the workshop (see the program).
Invited speaker : Kevin Scannell
Kevin Scannell, of Saint Louis Univ., Missouri, USA, has been working
with scholars of a range of smaller languages to develop web corpora
for those languages : website
http://borel.slu.edu/crubadan/stadas.html currently lists 135
corpora/languages.
Venue
Université catholique de Louvain
http://www.uclouvain.be/, in the elegant new city of
Louvain-la-Neuve http://www.eupedia.com/belgium/louvain-la-neuve.shtml
(Belgium). Large computer rooms will be available for demo sessions.
Registration
You will find the registration form on the conference web site.
"Early bird fees" apply until August 17, 2007.
Student: 100 euros / 125 euros (after August 17, 2007)
Others: 125 euros / 150 euros (after August 17, 2007)
Previous WAC workshops
More info at WAC1
http://sslmit.unibo.it/%7Ebaroni/web_as_corpus_cl05.html at Corpus
Linguistics conference, Birmingham, UK, July 2005.
More info at WAC2
http://sslmit.unibo.it/%7Ebaroni/web_as_corpus_eacl06.html at EACL,
Trento, Italy, April 2006.
Points of contact
Worskshop Co-chairs
Cédrick Fairon, UCLouvain, Cental, fairon at tedm.ucl.ac.be
Prof. Gilles-Maurice de Schryver, Universiteit Gent
Cleaneval committee
Marco Baroni, U Trento; Secretary, SIGWAC
Tony Hartley, U Leeds
Adam Kilgarriff, Lexical Computing Ltd; Chair, SIGWAC
Serge Sharoff, U Leeds
Local organisation team
Bernadette Dehottay, UCLouvain, Cental, dehottay at tedm.ucl.ac.be
Julia Medori, CENTAL, UCLouvain
Laurent Kevers, CENTAL, UCLouvain
Hubert Naets, CENTAL, UCLouvain
Isabelle Lecroart, CENTAL, UCLouvain
Claude Devis, CENTAL, UCLouvain
Contact us :
Bernadette Dehottay
Université catholique de Louvain
Centre for Natural Language Processing (CENTAL)
Place Blaise Pascal, 1
1348 Louvain-la-Neuve
Tel. +32 10 47 37 88
Fax. +32 10 47 26 06
dehottay at tedm.ucl.ac.be
Cédrick Fairon
cedrick.fairon at uclouvain.be
Directeur du CENTAL
Centre de traitement automatique du langage
Université catholique de Louvain
Place Blaise Pascal, 1
1348 Louvain-la-Neuve
Belgique
tel: +32 10 47 37 88
fax: +32 10 47 26 06
http://cental.fltr.ucl.ac.be
http://glossa.fltr.ucl.ac.be
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list