[Corpora-List] German corpora
Ralf Steinberger
ralf.steinberger at jrc.it
Fri Jan 25 13:31:40 UTC 2008
Hello Jaime,
You may want to have a look at the German documents of the freely available
JRC-Acquis corpus, downloadable from
http://langtech.jrc.it/JRC-Acquis.html
The document collection covers the last 50 years or so, but texts are
organised chronologically so that you can pick those of interest to you. The
corpus covers written language only, though.
I hope this helps. Kind regards,
Ralf
Ralf Steinberger ( <mailto:Ralf.Steinberger at jrc.it> Ralf.Steinberger at jrc.it)
European Commission - Joint Research Centre (JRC)
IPSC - SeS - Language Technology
URL: Applications: http://emm.jrc.it/overview.html
URL: The science behind them: <http://langtech.jrc.it/>
http://langtech.jrc.it.
JRC-Acquis Multilingual Parallel Corpus (Version 3)
* Freely available for research purposes.
* 22 languages: Bulgarian, Czech, Danish, German, Greek, English,
Spanish, Estonian, Finnish, French, Hungarian, Italian, Lithuanian, Latvian,
Maltese, Dutch, Polish, Portuguese, Romanian, Slovak, Slovene and Swedish.
* Altogether over 1 Billion words.
* Sentence alignment for 231 language pairs.
* For more information and download, see
<http://langtech.jrc.it/JRC-Acquis.html>
http://langtech.jrc.it/JRC-Acquis.html.
DGT-Translation Memory
* Freely available for research purposes.
* Aligned translation units for 231 language pairs.
* Alignment manually verified.
* For more information and download, see
http://langtech.jrc.it/DGT-TM.html.
The JRC's Language Technology group specialises in the development of highly
multilingual text analysis tools and in cross-lingual applications. Many
applications are accessible online, e.g.:
* <http://press.jrc.it/NewsExplorer/> NewsExplorer: multilingual news
aggregation and analysis (19 languages); allows to navigate the news over
time and across languages; trend analysis; collects information about people
from the news; social network detection.
* <http://press.jrc.it/> NewsBrief: breaking news detection and
display of the very latest thematic news from around the world; email
alerting (22+ languages).
* <http://medusa.jrc.it/> MedISys Medical Information System: latest
health-related news from around the world according to themes and diseases
(22+ languages).
* EMM-Labs <http://emm-labs.jrc.it:8080/> : Latest developments;
social networks; live people-in-the-news; country and theme fact sheets;
maps showing violent events world-wide.
-----Original Message-----
From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of
jaime.hunt at studentmail.newcastle.edu.au
Sent: 25 January 2008 10:10
To: Corpora at uib.no
Subject: [Corpora-List] German corpora
Hello
I'm a PhD student interested in researching German corpora for Anglicisms. I
am searching for all types of recent corpora, especially spoken corpora,
from around 2005 onwards.
I would really appreciate it if anybody is able to make any suggestions as
to what might be available for students to research free of charge.
Best regards,
Jaime
Mr Jaime Hunt MAppLing (TESOL), BA (Hons)
PhD (Linguistics) Candidate
School of Humanities and Social Science
McMullin Building
University of Newcastle
Callaghan
NSW 2308
Australia
Ph. +61 (0)2 4921 5175
Email: jaime.hunt at studentmail.newcastle.edu.au
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20080125/626bdf46/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list