[Corpora-List] CLEANEVAL Web-as-Corpus exercise
    Adam Kilgarriff 
    adam at lexmasterclass.com
       
    Tue Apr  3 16:24:12 UTC 2007
    
    
  
CLEANEVAL is a shared task and competitive evaluation for cleaning arbitrary
web pages, with the goal of preparing web data for use as a corpus, for
linguistic and language technology research and development.  You are
invited to participate, and to encourage others to do so too.
Development
<file:///C:\Documents%20and%20Settings\Adam\My%20Documents\Academic\CLEANEVA
L\devset.html>  dataset now available. 
*	Prizes! A prize of £250.00 (GBP) will be awarded for the best
student entrant for each task (Chinese and English). 
*	Fuller description
http://cleaneval.sigwac.org.uk/cleaneval-overview.html
<file:///C:\Documents%20and%20Settings\Adam\My%20Documents\Academic\CLEANEVA
L\cleaneval-overview.html> . 
*	Timetable: 
  _____  
*	March 2007: Development datasets released (English and Chinese) 
*	June 2007: Exercise: Evaluation dataset released and, two weeks
later, participants to return cleaned pages 
*	end June 2007: Papers describing systems to be submitted 
*	Sept 15-16 2007: Workshop, part of WAC3, Louvain-la-Neuve, Belgium
http://cental.fltr.ucl.ac.be/wac3/ 
  _____  
*	Annotation guidelines
http://cleaneval.sigwac.org.uk/annotation_guidelines.html
<file:///C:\Documents%20and%20Settings\Adam\My%20Documents\Academic\CLEANEVA
L\annotation_guidelines.html> . 
*	Co-ordinators 
*	Marco Baroni <http://sslmit.unibo.it/~baroni/> , Trento University,
Italy 
*	Tony Hartley <http://www.leeds.ac.uk/cts/staff/tony_hartley.htm> ,
Leeds University, UK 
*	Adam Kilgarriff <http://www.kilgarriff.co.uk> , Lexical Computing
Ltd., Leeds and Sussex Universities, UK 
*	Serge Sharoff <http://www.comp.leeds.ac.uk/ssharoff/> , Leeds
University, UK 
 
CLEANEVAL is an activity of ACL-SIGWAC <http://sigwac.org.uk> , the
Association for Computational Linguistics (ACL) <http://www.aclweb.org>
Special Interest Group on Web as Corpus.
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20070403/fdbb579b/attachment.htm>
    
    
More information about the Corpora
mailing list