[Corpora-List] Australian newspaper corpora
Eric Atwell
eric at comp.leeds.ac.uk
Thu May 3 08:35:25 UTC 2007
Monika,
if noone comes up with any suitable existing corpora, you could try
"baking your own" with web-as-corpus tools...
Google search for "australian newspaper" shows several likely sites
including "Australian Newspapers OnLine: Australian newspapers
that are available via their publishers on the Internet"
http://www.nla.gov.au/npapers/
You can use a web-as-corpus collection tool such as WWW-Bootcat,
a web-interface to Baroni's perl BootCat:
http://corpora.fi.muni.cz/bootcat/
or WeBoCa, a Java alternative by Leeds student Michael Drayson, an
extension of Andy Roberts' JBootCat: http://code.google.com/p/weboca/
- just restrict the corpus-harvesting to your chosen website(s).
But hopefully someone else will provide a "ready-made" corpus exactly to
your specifications :-)
Eric Atwell, Leeds University
On Thu, 3 May 2007, Monika Bednarek wrote:
> Dear all,
>
> I was wondering if there are any freely available corpora that consist of:
>
> 1) Australian newspaper reportage (apart from the files included in the
> Australian component of the ICE)
>
> or
>
> 2) (if possible Australian) newspaper headlines only
>
> or
>
> 3) (if possible Australian) newspaper captions only
>
> Any tips are very much appreciated.
>
> Best Regards,
>
> Monika
>
>
>
>
>
>
>
--
Eric Atwell,
Senior Lecturer, Language research group, School of Computing
Faculty of Engineering, UNIVERSITY OF LEEDS, Leeds LS2 9JT, England
TEL: 0113-3435430 FAX: 0113-3435468 WWW/email: google Eric Atwell
More information about the Corpora
mailing list