[Corpora-List] Web Content Extractor / Screen Scraper

Serge Sharoff s.sharoff at leeds.ac.uk
Tue Jun 19 06:29:59 UTC 2007


> P.s. Haven't there been a number of papers on "corpus from the web"
> tasks over the last year? What did they use?
you might have overlooked a competition, in which we are going to
collate approaches to this task:
http://cleaneval.sigwac.org.uk/
the call is still open.  The deadline for submissions is July 13.  

Best,
Serge

-- 
Dr. Serge Sharoff
Centre for Translation Studies
School of Modern Languages and Cultures
University of Leeds
Leeds, LS2 9JT

tel: +44(0)113 343 7287
fax: +44(0)113 343 3287



More information about the Corpora mailing list