[Corpora-List] Release: English-French + French-German parallel data from patents

Katharina Wäschle waeschle at cl.uni-heidelberg.de
Wed Dec 18 13:59:47 UTC 2013


PatTR is an open source parallel corpus created from patent text. The 
data was extracted from EPO, WIPO and USPTO patents and automatically 
aligned at the sentence level. Recently, we added data for two more 
language pairs, English-French and French-German. PatTR now contains

     23 million German-English sentence pairs from all patent text sections
     18.8 million English-French sentence pairs from all patent text 
sections
     5.1 million French-German sentence pairs from title, abstract and 
claims sections

The corpus is distributed under a Creative Commons License. For more 
information and download see 
http://www.cl.uni-heidelberg.de/statnlpgroup/pattr

Regards,
Katharina Wäschle

-- 
Institut für Computerlinguistik
Universität Heidelberg
Im Neuenheimer Feld 325, D-69120 Heidelberg
http://www.cl.uni-heidelberg.de/~waeschle


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list