PD: [Corpora-List] Date: Wed, 11 Sep 2002 15:16:20 +0200

Sampo Nevalainen samponev at cc.joensuu.fi
Thu Sep 12 14:26:54 UTC 2002


The whole terminology of corpus linguistics is admittedly pretty
anarchistic... and I can't help messing it up a little more :-)

Personally I hardly use the term "translation corpus". For the name only
suggests that the corpus consists of translations, so, in principle, a
translation corpus could be monolingual, bilingual or multilingual, and
contain just anything that fits under the notion of "translation"... (I am
not going to speculate here what is a translation... there are different
opinions about it, as well). By "parallel corpus" I mean a bi- or
multilingual corpus of originals and their translations into one or more
languages. Almost synonymous expressions for "parallel" are words like
"collateral", "concurrent" and "simultaneous". This implies that the texts
are quite strictly related to each other - in a sense one could say that
they mirror each other. So, for me it makes sense to use this term with
originals and their translations, which exist interdependently (although,
of course, the process of production is usually not simultaneous, what
comes to written language -- but cf. simultaneous interpreting...) A
"comparable corpus", then, is, as the name suggests, a corpus consisting of
(pairs or groups of) texts produced independently of each other, but that
are considered to be comparable in certain aspects. For example, in our
university we have compiled a comparable corpus, consisting of translated
and non-translated Finnish from different genres - but the translated texts
are _not_ translations of the texts originally written in Finnish. That is,
the corpus as a whole is a comparable corpus of two language variants.
(Hmm... perhaps one might call the translational part of the corpus a
"translation corpus"..?) With the same logic, a combined corpus of the
Brown and LOB corpora could be called a comparable corpus (of American and
British English). And, similarily, we could have a comparable corpus of a
certain special field or domain in different languages. Couldn't it be
easier..?

sampo



( : ============================================= : )

Sampo Nevalainen, M.A.
Researcher
University of Joensuu
Savonlinna School of Translation Studies
P.O.Box 48
FIN-57101 Savonlinna
FINLAND

tel     +358-15-511 70      (operator)
         +358-15-511 7704
fax     +358-15-515 096
email   samponev at cc.joensuu.fi
http://www.joensuu.fi/slnkvl/



More information about the Corpora mailing list