[Corpora-List] CLEANEVAL: invitation to participate
Adam Kilgarriff
adam at lexmasterclass.com
Sat May 26 06:41:30 UTC 2007
CLEANEVAL http://cleaneval.sigwac.org.uk <http://cleaneval.sigwac.org.uk/>
is an open evaluation exercise for programs for 'cleaning' web pages: taking
arbitrary web pages as input and delivering clean text, suitable for
training language models or linguistic research, as output. Better web
text cleaning will "remove the grit from the machine" for a wide variety of
NLP systems and applications. Our goals are to share knowledge and
experiences, and provide open source software that does it well.
Evaluation (downloading evaluation dataset, processing, and returning) to
take place between 11th and 22nd June 2007.
Please do participate (and/or encourage students and colleagues to do so
too). To express your intention, please email cleaneval at sigwac.org.uk .
There are prizes for the best student entries.
All details at http://cleaneval.sigwac.org.uk
<http://cleaneval.sigwac.org.uk/>
CLEANEVAL is an activity of the Association for Computational Linguistics
Special Interest Group on Web as Corpus (ACL-SIGWAC).
==========================================================
Adam Kilgarriff
Chair, ACL-SIGWAC http://sigwac.org.uk
<http://sigwac.org.uk/>
Lexical Computing Ltd http://www.sketchengine.co.uk
Universities of Leeds and Sussex
adam at lexmasterclass.com http://www.kilgarriff.co.uk
==========================================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20070526/1e57f4c4/attachment.htm>
More information about the Corpora
mailing list