Corpora: Seeking dependency-annotated non-English corpora

Philip Resnik resnik at umiacs.umd.edu
Thu Dec 14 01:07:08 UTC 2000


Greetings and happy holidays!

I'm looking for non-English corpora annotated with (or accompanied by)
information about syntactic dependencies, also known as grammatical
relations.  For my purposes even collections of only tens or hundreds
of annotated sentences are potentially helpful, although as always
more is better.  Data with parallel English translations would be
wonderful but that's probably too much to hope for.

French, Spanish, Chinese, and Arabic are rather high on my list,
though information on other languages would be useful.  (Yes, I
already know about PDT for Czech.)  Non-English corpora that were
annotated with automatic parsers, rather than by hand, could even
still be useful -- for example, I would be interested in parser output
consistent with the SPARKLE Level-2 annotation scheme, or in the
output of algorithms that identify particular predicate-argument
relations like subject and object.

Please reply privately, and I'll post a summary if there's interest.

Best,

  Philip

  ----------------------------------------------------------------
  Philip Resnik, Assistant Professor
  Department of Linguistics and Institute for Advanced Computer Studies

  1401 Marie Mount Hall            UMIACS phone: (301) 405-6760
  University of Maryland           Linguistics phone: (301) 405-8903
  College Park, MD 20742 USA	   Fax   : (301) 405-7104
  http://umiacs.umd.edu/~resnik	   E-mail: resnik at umiacs.umd.edu



More information about the Corpora mailing list