[Corpora-List] Australian newspaper corpora

Eric Atwell eric at comp.leeds.ac.uk
Thu May 3 08:35:25 UTC 2007


Monika,

if noone comes up with any suitable existing corpora, you could try 
"baking your own" with web-as-corpus tools... 
Google search for "australian newspaper" shows several likely sites 
including "Australian Newspapers OnLine: Australian newspapers
that are available via their publishers on the Internet"
http://www.nla.gov.au/npapers/

You can use a web-as-corpus collection tool such as WWW-Bootcat,
a web-interface to Baroni's perl BootCat:
http://corpora.fi.muni.cz/bootcat/

or WeBoCa, a Java alternative by Leeds student Michael Drayson, an
extension of Andy Roberts' JBootCat: http://code.google.com/p/weboca/

- just restrict the corpus-harvesting to your chosen website(s).


But hopefully someone else will provide a "ready-made" corpus exactly to
your specifications :-)

Eric Atwell, Leeds University


On Thu, 3 May 2007, Monika Bednarek wrote:

> Dear all,
>
> I was wondering if there are any freely available corpora that consist of:
>
> 1) Australian newspaper reportage (apart from the files included in the 
> Australian component of the ICE)
>
> or
>
> 2) (if possible Australian) newspaper headlines only
>
> or
>
> 3) (if possible Australian) newspaper captions only
>
> Any tips are very much appreciated.
>
> Best Regards,
>
> Monika
>
>
>
>
>
>
>

-- 
Eric Atwell,
Senior Lecturer, Language research group, School of Computing
Faculty of Engineering, UNIVERSITY OF LEEDS, Leeds LS2 9JT, England
TEL: 0113-3435430  FAX: 0113-3435468  WWW/email: google Eric Atwell



More information about the Corpora mailing list