[Corpora-List] REUTER corpus online?

Luis Sarmento parapraxe at excite.com
Thu Nov 17 13:42:14 UTC 2005


 Dear Serge, Thank you for you answer and kind offer.As for your suggestion of using Wacky, our problem is not so much that of obtaining "newswire" text from the web - because we could in fact obtain that text from the publicly available 14Gb collection of the portuguese web, the WTP03 (please see http://poloxldb.linguateca.pt/index.php?l=WPT_03 ) by using a similar procedure to the one you mentioned, since it is all indexed in a MySQL database - but instead that of obtaining a newswire collection that is manually classified by topic/domain and comparable to the english one. :)I was wandering that there could in fact be one such a collection available, since Reuters is a global news agency and I am sure that they produce a huge number of newswire texts everyday in several languages. Best, LS--- On Wed 11/16, Serge Sharoff < s.sharoff at leeds.ac.uk > wrote:From: Serge Sharoff [mailto: s.sharoff at leeds.ac.uk]To: parapraxe at excite.comCc: corpora at hd.uib.noDate: Wed, 16 Nov 2005 
10:57:39 +0000Subject: Re: [Corpora-List] REUTER corpus online?Luis,we have an online interface to the Reuters corpus (indexed byCorpusWorkbench). It's available from:http://corpus.leeds.ac.uk/Because of the agreement with Reuters the access is mostly limited toinhouse research. However, we can provide a password forresearch-related concordancing.As for Portuguese, if you have a reasonable list of words frequent inPortuguese newswires and a tagger/lemmatiser, a corpus like this can becollected from the web. See the Wacky initiative:http://wacky.sslmit.unibo.it/Best wishes,SergeOn Tue, 2005-11-15 at 10:45 -0500, Luis Sarmento wrote:> Dear Corpora-List members,> > > > Does anyone know if there is any publicly available online version of> the reuters corpus? In other words, is there any web concordace tool> (free) for the Reuters Corpus?> > Btw, I wonder if there are comparable versions of the reuters corpus> available, namely in Portuguese, for 
bilingual studies. Is anyone> using "comparable" version of reuters in Portuguese?> > Thanks to all,> > > > Lus Sarmento> > > > > -- Dr. Serge SharoffCentre for Translation StudiesSchool of Modern Languages and CulturesUniversity of LeedsLeeds, LS2 9JTtel: +44(0)113 343 7287fax: +44(0)113 343 3287

_______________________________________________
Join Excite! - http://www.excite.com
The most personalized portal on the Web!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20051117/8b3e1ccd/attachment.htm>


More information about the Corpora mailing list