[Corpora-List] Free multilingual resources JRC-Acquis and DGT-TM --- update

Ralf Steinberger ralf.steinberger at jrc.it
Thu May 29 16:27:46 UTC 2008


This is an update regarding the two freely available multilingual resources
JRC-Acquis and DGT Translation Memory (DGT-TM), distributed by the European
Commission's Joint Research Centre (JRC):

 

(1)     For version 3.0 of the multilingual parallel corpus JRC-Acquis (22
languages, 231 language pairs; 1 Billion words), bilingual alignments have
now also been produced with the alignment tool HunAlign. Until recently,
only Vanilla alignments were available. This additional resource will allow
comparison and benchmarking of alignment software for a wide variety of
language pairs, and users can choose the alignments that suit them better.

 

(2)     Following frequent user requests, the tool to extract bilingual
translation memories for all 231 DGT-TM language pairs is now available as
Java byte code so that the extraction tool can be used on operating systems
other than Windows. 

 

For more information on both resources, see
http://langtech.jrc.it/JRC-Acquis.html and
http://langtech.jrc.it/DGT-TM.html .

 

 

 

Ralf Steinberger ( <mailto:Ralf.Steinberger at jrc.it> Ralf.Steinberger at jrc.it)

European Commission - Joint Research Centre (JRC)
IPSC - SeS - Web Technology and Intelligence 
URL: Applications: http://press.jrc.it/overview.html
URL: The science behind them:  <http://langtech.jrc.it/>
http://langtech.jrc.it.

JRC-Acquis Multilingual Parallel Corpus (Version 3)

*       Freely available for research purposes.

*       22 languages: Bulgarian, Czech, Danish, German, Greek, English,
Spanish, Estonian, Finnish, French, Hungarian, Italian, Lithuanian, Latvian,
Maltese, Dutch, Polish, Portuguese, Romanian, Slovak, Slovene and Swedish.

*       Altogether over 1 Billion words.

*       Sentence alignment for 231 language pairs.

*       For more information and download, see
<http://langtech.jrc.it/JRC-Acquis.html>
http://langtech.jrc.it/JRC-Acquis.html.

 


DGT-Translation Memory

*       Freely available for research purposes.

*       Aligned translation units for 231 language pairs.

*       Alignment manually verified.

*       For more information and download, see
http://langtech.jrc.it/DGT-TM.html.

 


The JRC's Language Technology group specialises in the development of highly
multilingual text analysis tools and in cross-lingual applications. Many
applications are accessible online, e.g.:

*        <http://press.jrc.it/NewsExplorer/> NewsExplorer: multilingual news
aggregation and analysis (19 languages); allows to navigate the news over
time and across languages; trend analysis; collects information about people
from the news; social network detection.

*        <http://press.jrc.it/NewsBrief/> NewsBrief: breaking news detection
and display of the very latest thematic news from around the world; email
alerting (22 to 43 languages).

*        <http://medusa.jrc.it/> MedISys Medical Information System: latest
health-related news from around the world according to themes and diseases;
early warning; email notification; trend graphs (22 to 43 languages).

*       EMM-Labs <http://emm-labs.jrc.it/> : Latest developments; social
networks; live people-in-the-news; country and theme fact sheets; maps
showing violent events world-wide.

 

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20080529/38eeecae/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list