[Corpora-List] English Newspaper Corpora

Tony Rose tgr at tonyrose.net
Tue Sep 17 22:27:10 UTC 2002


Have you tried the Reuters Corpus (Volume 1)? It's not downloadable as such, but
you only need to fill in a form to obtain a copy (on CD). It comprises ~810,000
English language news stories from the period 20/8/96 - 19/8/97, formatted in
NewsML (a dialect of XML), so manipulating the raw data shouldn't be too
problematic. It's available from:

http://about.reuters.com/researchandstandards/corpus/

Best regards,
Tony

----- Original Message -----
From: "Siew Imm Tan" <xiuyin at hotmail.com>
To: <corpora at hd.uib.no>
Sent: Tuesday, September 17, 2002 3:47 AM
Subject: [Corpora-List] English Newspaper Corpora


> Does anyone know of any English (UK, US or Australia) newspaper corpus that
> can be downloaded and analysed using TACT and/or Wordsmith? I know that
> Collins Wordbanks has three substantial newspaper components but these can
> only be analysed using Lookup and the raw data cannot be downloaded. The BNC
> can be analysed using Wordsmith but does not seem to have a specific
> newspaper component. Any advice as to where such a corpus can be bought or
> subscribe to would be greatly appreciated.
>
> Tan Siew Imm
> Postgraduate Student
> Department of English
> University of Hong Kong
>
>
> _________________________________________________________________
> MSN Photos is the easiest way to share and print your photos:
> http://photos.msn.com/support/worldwide.aspx
>
>
>
>



More information about the Corpora mailing list