Corpora: Parallel corpus
Philip Resnik
resnik at umiacs.umd.edu
Tue Dec 19 20:28:53 UTC 2000
Yuliya Katsnelson asked:
> >I am looking for a parallel corpus (news, etc.) in English and
> >optimally, Eastern European languages.
Mike Maxwell Mike_Maxwell at sil.org replied:
> For nearly every written language, there is at least one parallel
> corpus: the Bible (or at least the New Testament). There are
> obvious shortcomings with such a source (the alignment is at the
> verse level, which may be too broad for some purposes; much of the
> vocabulary is likely to be in semantic domains not of wider
> interest; there are issues of translation style; the corpus may be
> too small; etc.). But it's there, and in many cases should be
> available in electronic form, perhaps even on the web.
At the University of Maryland we've done some work on systematizing
the Bible as a parallel corpus using the Corpus Encoding Standard
(CES), as well as investigating the properties of the text from a
computational linguistics perspective. See the Web page at
http://umiacs.umd.edu/~resnik/parallel/ for information and
references.
Philip
----------------------------------------------------------------
Philip Resnik, Assistant Professor
Department of Linguistics and Institute for Advanced Computer Studies
1401 Marie Mount Hall UMIACS phone: (301) 405-6760
University of Maryland Linguistics phone: (301) 405-8903
College Park, MD 20742 USA Fax : (301) 405-7104
http://umiacs.umd.edu/~resnik E-mail: resnik at umiacs.umd.edu
More information about the Corpora
mailing list