[Corpora-List] CLEANEVAL: invitation to participate

Adam Kilgarriff adam at lexmasterclass.com
Sat May 26 06:41:30 UTC 2007


CLEANEVAL http://cleaneval.sigwac.org.uk <http://cleaneval.sigwac.org.uk/>
is an open evaluation exercise for programs for 'cleaning' web pages: taking
arbitrary web pages as input and delivering clean text, suitable for
training language models or linguistic research, as output.   Better web
text cleaning will "remove the grit from the machine" for a wide variety of
NLP systems and applications.  Our goals are to share knowledge and
experiences, and provide open source software that does it well.

 

Evaluation (downloading evaluation dataset, processing, and returning) to
take place between 11th and 22nd June 2007.

Please do participate (and/or encourage students and colleagues to do so
too).  To express your intention, please email cleaneval at sigwac.org.uk .
There are prizes for the best student entries.

All details at http://cleaneval.sigwac.org.uk
<http://cleaneval.sigwac.org.uk/>  

CLEANEVAL is an activity of the Association for Computational Linguistics
Special Interest Group on Web as Corpus (ACL-SIGWAC).

==========================================================

Adam Kilgarriff

Chair, ACL-SIGWAC                          http://sigwac.org.uk
<http://sigwac.org.uk/>  

Lexical Computing Ltd        http://www.sketchengine.co.uk 

Universities of Leeds and Sussex

adam at lexmasterclass.com        http://www.kilgarriff.co.uk 

==========================================================

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20070526/1e57f4c4/attachment.htm>


More information about the Corpora mailing list