[Corpora-List] Getting articles from newspapers to compile a corpus
Khalid CHOUKRI
choukri at elda.org
Thu Nov 29 21:16:13 UTC 2012
Hi Matías
which languages and domains are you looking for and what sizes? and are
you looking for monolingual data?
ELRA regularly collects such data (after negotiating the rights), we may
have something to share with you.
Best regards
Khalid
Matías Guzmán wrote, On 29/11/2012 19:21:
> Hi all,
>
> I was wondering if anyone knows how to get every possible article from
> online newspapers and magazines. I was thinking something like giving a
> program the URL of the newspaper (e.g. www.eltiempo.com) and getting the
> text from all pages therein. Is that possible?
>
> Thanks a lot,
>
> Matías
>
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
--
*Khalid Choukri *
ELRA General secretary & ELDA CEO
email: choukri at elda.org;
Web: www.elra.info www.elda.org
Tel. +33 1 43 13 33 33 - Fax. +33 1 43 13 33 30
****************************************************
** Info on LREC 2012 : www.lrec-conf.org
***************************************************
*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20121129/7e706e77/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list