[Corpora-List] Summary Newspaper Corpora 2

Jan Strunk strunk at linguistics.ruhr-uni-bochum.de
Wed Apr 16 14:46:34 UTC 2003


Hello,

as there have been quite a few responses
on my second query, I'll post another summary.

In my second query, I asked for available newspaper
corpora in languages except (English, French, German
and Spanish, for which a lot of ressources seem to exist).

Many thanks for your information!

Best regards!
Jan Strunk
strunk at linguistics.ruhr-uni-bochum.de

Rik De Busser suggest a meta-site:
>This might help:
>http://www.ims.uni-stuttgart.de/info/Newspapers.html
>(Not all of them are for free)

Yvonne suggested the following sources:
>Danish  http://korpus.dsl.dk/korpus2000/indgang.php

>Bosnian http://www.tekstlab.uio.no/Bosnian/Corpus.html

>Swedish  http://spraakdata.gu.se/lb/konk/

Antti Arppe suggested the Finnish language bank:
>Well there is a substantial amount of Finnish newspaper corpora (tens
>of millions of words) and a lesser amount of Swedish newspaper
>material (published in Finland) available in the Finnish text bank:
>
>http://www.csc.fi/kielipankki/
>
>All the info appears to be in Finnish or Swedish, but you can try to
>contact e.g. Manne Miettinen, tel. +358 9 457 2517 e-mail:
><manne.miettinen at csc.fi>.

Elisabeth Burr:
>I can only help out with Italian, French and Spanish newspaper corpora.
>See:
>
>http://www.uni-duisburg.de/Fak2/FremdPhil/Romanistik/Personal/Burr/humcomp/
>
>Oxford Text Archive Corpus of Italian Newspapers
>
>"Italian Newspaper Corpus (ita03)", in: Association for Computational Linguistics:
>European Corpus Initiative Multilingual Corpus 1 (ECI/MCI) CD-ROM:
>\data\ eci1\ 

Bilge Say:
>About your recent posting to corpora-list, we have a corpus of 2 M words of post
>1990 written Turkish, which includes about 40% newspaper material (not all
>of them news items though, including editorials, columns etc). It is available for
>free for academic purposes; contact our project assistant Umut Ozge at
>umut at ii.metu.edu.tr
>for filling out the required form and receiving the corpus over the
>Internet.
>
>Kemal Oflazer at Sabanci University has also a newspaper corpus of Turkish
>(I think about 10 M words). He can be contacted at  oflazer at sabanciuniv.edu

Seza Dogruoz suggested that I contact Bilge Say.

Shlomo Yona also suggested a Turkish corpus and offered help with Hebrew:
>I have corpora of newspaper articles in Hebrew.
>Tagged Turkish news text can be found at:
>http://www.nlp.cs.bilkent.edu.tr/Center/Corpus/

and last but not least, Paul McNamee suggested the CLEF project.
>Not for those three, but the CLEF activity has created a newspaper
>corpus in 8 languages with O(100k) articles per language from the years
>1994 and 1995. In addition to German and English they have:
>Dutch, Finnish, French, Italian, Spanish, and Swedish.  Check out
>the CLEF site at http://www.clef-campaign.org/  You might also want
>to investigate the holdings of ELRA and the LDC.

Many thanks again!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20030416/b9ea9618/attachment.htm>


More information about the Corpora mailing list