[Corpora-List] Release: 23M German-English parallel sentences from patent text
Katharina Wäschle
waeschle at cl.uni-heidelberg.de
Tue Mar 5 13:11:49 UTC 2013
We are happy to announce the release of a parallel corpus of patent text
for the German-English language pair. The corpus has been constructed
from EPO, WIPO and USPTO patent documents extracted from the MAREC
collection and contains 23 million sentence pairs from all patent text
sections.
All sentences are labeled with metadata: patent document id, patent
family, patent classification and publication date.
The corpus is distributed under a Creative Commons License. For more
information and download, please see
http://www.cl.uni-heidelberg.de/statnlpgroup/pattr
Regards,
Katharina Wäschle
--
Institut für Computerlinguistik
Universität Heidelberg
Im Neuenheimer Feld 325, D-69120 Heidelberg
http://www.cl.uni-heidelberg.de/~waeschle
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list