[Corpora-List] German corpora

Ralf Steinberger ralf.steinberger at jrc.it
Fri Jan 25 13:31:40 UTC 2008

Hello Jaime,


You may want to have a look at the German documents of the freely available
JRC-Acquis corpus, downloadable from




The document collection covers the last 50 years or so, but texts are
organised chronologically so that you can pick those of interest to you. The
corpus covers written language only, though.


I hope this helps. Kind regards,






Ralf Steinberger ( <mailto:Ralf.Steinberger at jrc.it> Ralf.Steinberger at jrc.it)

European Commission - Joint Research Centre (JRC)
IPSC - SeS - Language Technology 
URL: Applications: http://emm.jrc.it/overview.html
URL: The science behind them:  <http://langtech.jrc.it/>

JRC-Acquis Multilingual Parallel Corpus (Version 3)

*       Freely available for research purposes.

*       22 languages: Bulgarian, Czech, Danish, German, Greek, English,
Spanish, Estonian, Finnish, French, Hungarian, Italian, Lithuanian, Latvian,
Maltese, Dutch, Polish, Portuguese, Romanian, Slovak, Slovene and Swedish.

*       Altogether over 1 Billion words.

*       Sentence alignment for 231 language pairs.

*       For more information and download, see


DGT-Translation Memory

*       Freely available for research purposes.

*       Aligned translation units for 231 language pairs.

*       Alignment manually verified.

*       For more information and download, see


The JRC's Language Technology group specialises in the development of highly
multilingual text analysis tools and in cross-lingual applications. Many
applications are accessible online, e.g.:

*        <http://press.jrc.it/NewsExplorer/> NewsExplorer: multilingual news
aggregation and analysis (19 languages); allows to navigate the news over
time and across languages; trend analysis; collects information about people
from the news; social network detection.

*        <http://press.jrc.it/> NewsBrief: breaking news detection and
display of the very latest thematic news from around the world; email
alerting (22+ languages).

*        <http://medusa.jrc.it/> MedISys Medical Information System: latest
health-related news from around the world according to themes and diseases
(22+ languages).

*       EMM-Labs <http://emm-labs.jrc.it:8080/> : Latest developments;
social networks; live people-in-the-news; country and theme fact sheets;
maps showing violent events world-wide.




-----Original Message-----
From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of
jaime.hunt at studentmail.newcastle.edu.au
Sent: 25 January 2008 10:10
To: Corpora at uib.no
Subject: [Corpora-List] German corpora




I'm a PhD student interested in researching German corpora for Anglicisms. I
am searching for all types of recent corpora, especially spoken corpora,
from around 2005 onwards.


I would really appreciate it if anybody is able to make any suggestions as
to what might be available for students to research free of charge.


Best regards,



Mr Jaime Hunt MAppLing (TESOL), BA (Hons)

PhD (Linguistics) Candidate

School of Humanities and Social Science

McMullin Building

University of Newcastle 


NSW 2308



Ph. +61 (0)2 4921 5175

Email: jaime.hunt at studentmail.newcastle.edu.au



Corpora mailing list

Corpora at uib.no


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20080125/626bdf46/attachment.htm>
-------------- next part --------------
Corpora mailing list
Corpora at uib.no

More information about the Corpora mailing list