[Corpora-List] Web pages corpus

Jakob Halskov jh.id at cbs.dk
Mon Mar 6 12:21:01 UTC 2006


Dear Imen,

It is very easy to compile a web corpus on your own using one of the freely available web search APIs. See for example:

http://developer.yahoo.net/search/index.html

or

http://www.google.com/apis/

Best regards,

Jakob Halskov
--
PhD student
Dept. of Computational Linguistics
Copenhagen Business School
www.id.cbs.dk

----- Original Message -----
From: "ismi.touati" <ismi.touati at laposte.net>
Date: Monday, March 6, 2006 12:29 pm
Subject: [Corpora-List] Web pages corpus

> Dear all,
> 
> I'm working on automatic summarization of web pages, i'm looking 
> for a corpus of web 
> 
> pages (html documents) with their abstract to evaluate my system. 
> 
> Does anyone knows if such a corpus exists?
> 
> Thanks in advance for the help.
> Imen.
> 
> ***********************************
> Imen Touati
> Master Student at Faculty of Economic Science and management of 
> sfax, 
> Tunisia.
> LARIS laboratory
> Addresse : LARIS, FSEGS, BP 1088, 3018 Sfax, Tunisia
> 
> Accédez au courrier électronique de La Poste : www.laposte.net ; 
> 3615 LAPOSTENET (0,34 ?/mn) ; tél : 08 92 68 13 50 (0,34?/mn)
> 
> 
> 



More information about the Corpora mailing list