[Corpora-List] Release 2014 of DGT-TM (parallel corpus in 24 languages)

Ralf Steinberger ralf.steinberger at jrc.ec.europa.eu
Thu Sep 18 19:18:59 UTC 2014


Hello John,

 

Of course I agree with you. It would be great if these parallel corpora were
available for further languages.

 

UN corpora do exist for distribution elsewhere, but the EU corpora (which
are the ones we at the JRC have access to) are by definition 'only' for the
24 official EU languages. 

 

Having said that, the translation memories ECDC-TM and EAC-TM (also
available at  <https://ec.europa.eu/jrc/en/language-technologies>
https://ec.europa.eu/jrc/en/language-technologies) do include small numbers
of non-EU languages. Furthermore, the resource
<https://ec.europa.eu/jrc/en/language-technologies/jrc-names> JRC-Names
includes details on many non-EU languages because it was produced by
analysing news articles from around the world. 

 

For details on the choice of languages, on the differences between the
available EU corpora, and on why we can distribute these parallel corpora
and other highly multilingual language resources, etc., have a look at the
brand-new overview paper:

 

     Steinberger Ralf, Mohamed Ebrahim, Alexandros Poulis, Manuel

     Carrasco-Benitez, Patrick Schlüter, Marek Przybyszewski & Signe Gilbro
(August 2014). 
      <http://link.springer.com/article/10.1007/s10579-014-9277-0> An
overview of the European Union's highly multilingual parallel corpora. 
     Language Resources and Evaluation Journal (LRE). 
     

     Springer link:
http://link.springer.com/article/10.1007/s10579-014-9277-0
     Manuscript:
<http://langtech.jrc.ec.europa.eu/Documents/2014_08_LRE-Journal_JRC-Linguist
ic-Resources_Manuscript.pdf>
http://langtech.jrc.ec.europa.eu/Documents/2014_08_LRE-Journal_JRC-Linguisti
c-Resources_Manuscript.pdf 

 

All the best,

 

Ralf 

 

 

-----Original Message-----
From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of
John F Sowa
Sent: 18 September 2014 20:47
To: corpora at uib.no
Subject: Re: [Corpora-List] Release 2014 of DGT-TM (parallel corpus in 24
languages)

 

On 9/18/2014 9:35 AM, Ralf Steinberger wrote:

> Readers on this list may be interested to hear that the 2014 release 

> of the DGT-Translation Memory is now available for download.

 

That is an excellent resource for the languages of the EU.  But it would be
helpful to have at least a subset for languages outside the EU -- especially
for the official languages of the UN that are not in the EU.

 

Are such resources available?

 

John

 

 

 

_______________________________________________

UNSUBSCRIBE from this page:  <http://mailman.uib.no/options/corpora>
http://mailman.uib.no/options/corpora

Corpora mailing list

 <mailto:Corpora at uib.no> Corpora at uib.no

 <http://mailman.uib.no/listinfo/corpora>
http://mailman.uib.no/listinfo/corpora

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20140918/edbfc891/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list