Corpora: Language in Patent Texts
neumann
neumann at nova.co.jp
Tue May 28 01:58:10 UTC 2002
Hello,
I work for a commercial Japanese-English MT system and wanted to know if
somebody could enlighten me on literature or research on the
characteristics of language in patent or trademark specifications
(=texts) such as inventors submit them to national or international
patent offices as the EPO in Munich.
Rather than in the normative "author`s guidelines" provided by patent
offices, which describe the fixed, formalised structure and idioms in
such texts, I am interested in the general linguistic and statistical
aspects of such texts (e.g. almost no use of proper nouns; anaphorical
relations; statistical preference of gerund clauses over relative
clauses with inflected verb, for English e.g. preference of latin-origin
words over germanic-origin words etc.).
Are tagged corpora available somewhere (even within larger bodies, e.g.
of legal texts, and for any source language).
I will post a summary of your replies on this list.
Thank you!
--
Dr. Christoph Neumann neumann at nova.co.jp
R&D MT, Nova Inc.
Tokyo, Japan
http://www.nova.co.jp/english/index.html
More information about the Corpora
mailing list