[Corpora-List] Release: English-French + French-German parallel data from patents
Katharina Wäschle
waeschle at cl.uni-heidelberg.de
Wed Dec 18 13:59:47 UTC 2013
PatTR is an open source parallel corpus created from patent text. The
data was extracted from EPO, WIPO and USPTO patents and automatically
aligned at the sentence level. Recently, we added data for two more
language pairs, English-French and French-German. PatTR now contains
23 million German-English sentence pairs from all patent text sections
18.8 million English-French sentence pairs from all patent text
sections
5.1 million French-German sentence pairs from title, abstract and
claims sections
The corpus is distributed under a Creative Commons License. For more
information and download see
http://www.cl.uni-heidelberg.de/statnlpgroup/pattr
Regards,
Katharina Wäschle
--
Institut für Computerlinguistik
Universität Heidelberg
Im Neuenheimer Feld 325, D-69120 Heidelberg
http://www.cl.uni-heidelberg.de/~waeschle
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list