Corpora: Language in Patent Texts

neumann neumann at
Tue May 28 01:58:10 UTC 2002


I work for a commercial Japanese-English MT system and wanted to know if
somebody could enlighten me on literature or research on the
characteristics of language in patent or trademark specifications
(=texts) such as inventors submit them to national or international
patent offices as the EPO in Munich.

Rather than in the normative "author`s guidelines" provided by patent
offices, which describe the fixed, formalised structure and idioms in
such texts, I am interested in the general linguistic and statistical
aspects of such texts (e.g. almost no use of proper nouns; anaphorical
relations; statistical preference of gerund clauses over relative
clauses with inflected verb, for English e.g. preference of latin-origin
words over germanic-origin words etc.).

Are tagged corpora available somewhere (even within larger bodies, e.g.
of legal texts, and for any source language).

I will post a summary of your replies on this list.

Thank you!

Dr. Christoph Neumann 		neumann at
R&D MT, Nova Inc.
Tokyo, Japan

More information about the Corpora mailing list