Livre: Building and Exploring Web Corpora

Thierry Hamon thierry.hamon at LIPN.UNIV-PARIS13.FR
Fri Oct 26 12:51:45 UTC 2007


Date: Thu, 25 Oct 2007 13:26:48 +0200
From: Cédrick Fairon <cedrick.fairon at uclouvain.be>
Message-Id: <793DEB4B-D999-4FE4-A6A1-98A41A687E48 at uclouvain.be>
X-url: http://
X-url: http://cental.fltr.ucl.ac.be
X-url: http://glossa.fltr.ucl.ac.be


Chers Collègues,

J'ai le plaisir de vous annoncer la parution de :

"Building and Exploring Web Corpora"
Proceedings of the 3rd web as corpus workshop, incorporating cleaneval
Cédrick FAIRON , Hubert NAETS, Adam KILGARRIFF et Gilles-Maurice de  
SCHRYVER (eds)
In Cahiers du CENTAL, Presses universitaires de Louvain, Louvain-la- 
Neuve, 2007

Le livre est disponible au format PDF ainsi qu'en version papier.
Table des matières, informations et commandes: voir
http://www.i6doc.com/docs/cental4

Résumé (le livre est en anglais)

WAC
More and more people are using Web data for linguistic and NLP
research. The Web as Corpusworkshop (WAC) provides a venue for
exploring how we can use it effectively and the advancementsto which
this could lead.This book is a collection of the talks presented at
the 3 rd WAC in Louvain-la-Neuve (Belgium).The focus is on the
description of Web corpus collection projects, the exploration of Web
datacharacteristics from a linguistics/NLP perspective, and on the use
of crawled Web data for NLPpurposes.

CLEANEVAL
Any use of Web data requires that it be cleaned in order to get rid of
unwanted material including,for example, HTML markup, navigation bars,
advertisements. To date there has been no sharingof resources or
expertise in this particular domain and the cleaning has often been
done minimally.Cleaneval was an exercise aimed at promoting
collaboration and improving our understandingof the issues. Results
and perspectives are presented in this book.

Cédrick Fairon
cedrick.fairon at uclouvain.be

Directeur du CENTAL
Centre de traitement automatique du langage
Université catholique de Louvain
Place Blaise Pascal, 1
1348 Louvain-la-Neuve
Belgique
tel: +32 10 47 37 88
fax: +32 10 47 26 06

http://cental.fltr.ucl.ac.be
http://glossa.fltr.ucl.ac.be



Cédrick Fairon
cedrick.fairon at uclouvain.be

Directeur du CENTAL
Centre de traitement automatique du langage
Université catholique de Louvain
Place Blaise Pascal, 1
1348 Louvain-la-Neuve
Belgique
tel: +32 10 47 37 88
fax: +32 10 47 26 06

http://cental.fltr.ucl.ac.be
http://glossa.fltr.ucl.ac.be


-------------------------------------------------------------------------
Message diffuse par la liste Langage Naturel <LN at cines.fr>
Informations, abonnement : http://www.atala.org/article.php3?id_article=48
English version       : 
Archives                 : http://listserv.linguistlist.org/archives/ln.html
                                http://liste.cines.fr/info/ln

La liste LN est parrainee par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhesion  : http://www.atala.org/
-------------------------------------------------------------------------



More information about the Ln mailing list