[Corpora-List] Constitution

Detmar Meurers dm at ling.ohio-state.edu
Sun May 15 17:41:30 UTC 2005


Hi Jean,

    Anyway, it occurred to me that now that an aligned version exists
    (it was announced on this list the other day :
    http://logos.uio.no/opus), an interesting application would be to
    develop programs for the (semi?) automatic verification of
    translations! Has anybody done this before?

One can see this as an instance of the task of detecting variation
in corpus annotation. The variation n-gram approach for detecting
inconsistencies/errors in corpus annotation that Markus Dickinson
and I have worked on (cf. references below) should be able to do
this task for aligned parallel corpora (we included it in a recent
project proposal) - it'll be interesting to see what equivalence
classes of nuclei and contexts work best for this task.

Best,
Detmar


Markus Dickinson & Detmar Meurers (2005): `Detecting Errors in
  Discontinuous Structural Annotation'. Proceedings of the 43rd
  Annual Meeting of the Association for Computational Linguistics
  (ACL-05). Ann Arbor, Michigan.

Markus Dickinson & Detmar Meurers (2005): `Detecting Annotation
  Errors in Spoken Language Corpora'. Proceedings of the Special
  session on treebanks for spoken language and discourse at the 15th
  Nordic Conference of Computational Linguistics (NODALIDA-05).
  Joensuu, Finland.

Markus Dickinson & Detmar Meurers (2003): `Detecting Inconsistencies
  in Treebanks'. Proceedings of the Second Workshop on Treebanks and
  Linguistic Theories (TLT 2003). Växjö, Sweden.

Markus Dickinson & Detmar Meurers (2003): `Detecting Errors in
  Part-of-Speech Annotation'. Proceedings of the 10th Conference of
  the European Chapter of the Association for Computational
  Linguistics (EACL-03). Budapest, Hungary.

Available from http://ling.osu.edu/~dm/papers.html


--
Detmar Meurers, Assistant Professor, Dept. of Linguistics, OSU
201a Oxley Hall, 1712 Neil Avenue, Columbus OH 43210-1298, USA
http://ling.osu.edu/~dm/                 GnuPG key on web page



More information about the Corpora mailing list