[Corpora-List] German corpora

Ralf Steinberger ralf.steinberger at jrc.it
Fri Jan 25 13:31:40 UTC 2008


Hello Jaime,

 

You may want to have a look at the German documents of the freely available
JRC-Acquis corpus, downloadable from

 

            http://langtech.jrc.it/JRC-Acquis.html

 

The document collection covers the last 50 years or so, but texts are
organised chronologically so that you can pick those of interest to you. The
corpus covers written language only, though.

 

I hope this helps. Kind regards,

 

Ralf

 

 

 

Ralf Steinberger ( <mailto:Ralf.Steinberger at jrc.it> Ralf.Steinberger at jrc.it)

European Commission - Joint Research Centre (JRC)
IPSC - SeS - Language Technology 
URL: Applications: http://emm.jrc.it/overview.html
URL: The science behind them:  <http://langtech.jrc.it/>
http://langtech.jrc.it.

JRC-Acquis Multilingual Parallel Corpus (Version 3)

*       Freely available for research purposes.

*       22 languages: Bulgarian, Czech, Danish, German, Greek, English,
Spanish, Estonian, Finnish, French, Hungarian, Italian, Lithuanian, Latvian,
Maltese, Dutch, Polish, Portuguese, Romanian, Slovak, Slovene and Swedish.

*       Altogether over 1 Billion words.

*       Sentence alignment for 231 language pairs.

*       For more information and download, see
<http://langtech.jrc.it/JRC-Acquis.html>
http://langtech.jrc.it/JRC-Acquis.html.

 


DGT-Translation Memory

*       Freely available for research purposes.

*       Aligned translation units for 231 language pairs.

*       Alignment manually verified.

*       For more information and download, see
http://langtech.jrc.it/DGT-TM.html.

 


The JRC's Language Technology group specialises in the development of highly
multilingual text analysis tools and in cross-lingual applications. Many
applications are accessible online, e.g.:

*        <http://press.jrc.it/NewsExplorer/> NewsExplorer: multilingual news
aggregation and analysis (19 languages); allows to navigate the news over
time and across languages; trend analysis; collects information about people
from the news; social network detection.

*        <http://press.jrc.it/> NewsBrief: breaking news detection and
display of the very latest thematic news from around the world; email
alerting (22+ languages).

*        <http://medusa.jrc.it/> MedISys Medical Information System: latest
health-related news from around the world according to themes and diseases
(22+ languages).

*       EMM-Labs <http://emm-labs.jrc.it:8080/> : Latest developments;
social networks; live people-in-the-news; country and theme fact sheets;
maps showing violent events world-wide.

 

 

 

-----Original Message-----
From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of
jaime.hunt at studentmail.newcastle.edu.au
Sent: 25 January 2008 10:10
To: Corpora at uib.no
Subject: [Corpora-List] German corpora

 

Hello 

 

I'm a PhD student interested in researching German corpora for Anglicisms. I
am searching for all types of recent corpora, especially spoken corpora,
from around 2005 onwards.

 

I would really appreciate it if anybody is able to make any suggestions as
to what might be available for students to research free of charge.

 

Best regards,

Jaime

 

Mr Jaime Hunt MAppLing (TESOL), BA (Hons)

PhD (Linguistics) Candidate

School of Humanities and Social Science

McMullin Building

University of Newcastle 

Callaghan

NSW 2308

Australia

 

Ph. +61 (0)2 4921 5175

Email: jaime.hunt at studentmail.newcastle.edu.au

 

_______________________________________________

Corpora mailing list

Corpora at uib.no

http://mailman.uib.no/listinfo/corpora

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20080125/626bdf46/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list