[Corpora-List] Newspaper Corpora

Jan Strunk strunk at linguistics.ruhr-uni-bochum.de
Tue Apr 15 09:36:52 UTC 2003


Hi,

yesterday I had asked for suggestions about newspaper corpora.
Many thanks to the people you have answered so far.
They already provided me with a lot of suggestions (summary below).
 
Unfortunately, all the suggested corpora were either
in English or in German. Its exactly these two languages that
I have already evaluated (on the Wall Street Journal corpus and
Neue Zürcher Zeitung).

Do you perhaps know of any newspaper corpora in other languages like Danish,
Turkish or Hungarian?

Thanks!

Jan
strunk at linguistics.ruhr-uni-bochum.de
Sprachwissenschaftliches Institut
Ruhr-Universität Bochum
Germany

 Summary of the responses I got so far:

Mahtab Nikkhou suggested looking at the ELDA ressources collection:
>You may have a look at ELDA's on-line language resources catalogue from: 
>http://www.elda.fr/cata/tabtext.html
>If you wish to order a database, please contact Ms Valerie Mapelli at 
>mapelli at elda.fr

Jana Diesner suggested the following for German:
> der klassiker: http://corpora.ids-mannheim.de/~cosmas/, auch unter: http://www.ids-mannheim.de/kt/corpora.shtml
> alternativ: http://www.coli.uni-sb.de/sfb378/negra-corpus/

Tony Rose:
> You could also try the Reuters Corpus:
> http://about.reuters.com/researchandstandards/corpus/
> It's an archive of some 800,000 English language news stories, is freely available, and marked up in XML (NewsML in fact).

Jerome Richalot:
>How about the METER COrpus at
>http://www.dcs.shef.ac.uk/nlp/meter/Metercorpus/metercorpus.htm
 
And last but not least Thorsten Brants proposed the NEGRA corpus:
>the NEGRA Corpus (http://www.coli.uni-sb.de/sfb378/negra-corpus/)
>contains articles from the German newspaper Frankfurter Rundschau.
>As part of the syntactic annotation, the texts are separated into sentences,
>which disambiguates the periods.

Thanks again!


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20030415/3cb5d273/attachment.htm>


More information about the Corpora mailing list