[Corpora-List] Release: 23M German-English parallel sentences from patent text

Katharina Wäschle waeschle at cl.uni-heidelberg.de
Tue Mar 5 13:11:49 UTC 2013


We are happy to announce the release of a parallel corpus of patent text 
for the German-English language pair. The corpus has been constructed 
from EPO, WIPO and USPTO patent documents extracted from the MAREC 
collection and contains 23 million sentence pairs from all patent text 
sections.

All sentences are labeled with metadata: patent document id, patent 
family, patent classification and publication date.

The corpus is distributed under a Creative Commons License. For more 
information and download, please see
http://www.cl.uni-heidelberg.de/statnlpgroup/pattr

Regards,
Katharina Wäschle

-- 
Institut für Computerlinguistik
Universität Heidelberg
Im Neuenheimer Feld 325, D-69120 Heidelberg
http://www.cl.uni-heidelberg.de/~waeschle


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list