Corpora: Parallel corpus

Philip Resnik resnik at umiacs.umd.edu
Tue Dec 19 20:28:53 UTC 2000


Yuliya Katsnelson asked:
>   >I am looking for a parallel corpus (news, etc.) in English and
>   >optimally, Eastern European languages.

Mike Maxwell Mike_Maxwell at sil.org replied:
>   For nearly every written language, there is at least one parallel
>   corpus: the Bible (or at least the New Testament).  There are
>   obvious shortcomings with such a source (the alignment is at the
>   verse level, which may be too broad for some purposes; much of the
>   vocabulary is likely to be in semantic domains not of wider
>   interest; there are issues of translation style; the corpus may be
>   too small; etc.).  But it's there, and in many cases should be
>   available in electronic form, perhaps even on the web.

At the University of Maryland we've done some work on systematizing
the Bible as a parallel corpus using the Corpus Encoding Standard
(CES), as well as investigating the properties of the text from a
computational linguistics perspective.  See the Web page at
http://umiacs.umd.edu/~resnik/parallel/ for information and
references.

  Philip

  ----------------------------------------------------------------
  Philip Resnik, Assistant Professor
  Department of Linguistics and Institute for Advanced Computer Studies

  1401 Marie Mount Hall            UMIACS phone: (301) 405-6760
  University of Maryland           Linguistics phone: (301) 405-8903
  College Park, MD 20742 USA	   Fax   : (301) 405-7104
  http://umiacs.umd.edu/~resnik	   E-mail: resnik at umiacs.umd.edu



More information about the Corpora mailing list