[Corpora-List] Resend: CLEANEVAL Web-as-Corpus exercise
Adam Kilgarriff
adam at lexmasterclass.com
Tue Apr 3 16:56:51 UTC 2007
**Apologies for faulty links in last version**
CLEANEVAL is a shared task and competitive evaluation for cleaning arbitrary
web pages, with the goal of preparing web data for use as a corpus, for
linguistic and language technology research and development. You are
invited to participate, and to encourage others to do so too.
Website: http://cleaneval.sigwac.org.uk
Development dataset now available.
* Prizes! A prize of £250.00 (GBP) will be awarded for the best
student entrant for each task (Chinese and English).
* Timetable:
* March 2007: Development datasets released (English and Chinese)
* June 2007: Exercise: Evaluation dataset released and, two weeks
later, participants to return cleaned pages
* end June 2007: Papers describing systems to be submitted
* Sept 15-16 2007: Workshop, part of WAC3, Louvain-la-Neuve, Belgium
http://cental.fltr.ucl.ac.be/wac3/
* Co-ordinators
* Marco Baroni, Trento University, Italy
* Tony Hartley, Leeds University, UK
* Adam Kilgarriff, Lexical Computing Ltd., Leeds and Sussex Univs, UK
* Serge Sharoff, Leeds University, UK
CLEANEVAL is an activity of ACL-SIGWAC, the Association for Computational
Linguistics (ACL) Special Interest Group on Web as Corpus.
More information about the Corpora
mailing list