[Corpora-List] Parallel / Comparable / Translation
oliver at ccl.bham.ac.uk
oliver at ccl.bham.ac.uk
Thu Sep 12 13:39:02 UTC 2002
> I admit that the term "translation corpus" is confusing: you would rather
> understand it as a "corpus of translations" than "corpus for translators" or
> "used mainly by translators" (which is the right interpretation).
I don't think there is a need for terms describing a corpus according to
the expected users; who cares whether a corpus is used by translators,
language teachers, or computational linguists?
If it is "corpus of translations" then it is either
- a "translation corpus" if it contains texts in one language and their
translations in one (or possibly more) other languages. This could
be viewed as a subtype of a parallel corpus, which doesn't have the
requirement that its elements are translations of each other.
OR
- a corpus consisting of texts in one language, which are translations
of some other texts (which are not in the corpus). This would be a
specialised sample of a monolingual corpus, similar in principle to a
corpus of newspaper articles, or some other externally specified text
type/genre/...
It doesn't make sense from a technical point of view to have a `mixed
bag' of texts in different languages in one single corpus `lump', unless
they're separate parts (as in a parallel/translation/comparable corpus).
So, a corpus containing elements in more than one language should really
only be either parallel or comparable, or should be a translation corpus
if it is retained as an independent category and not just a subtype of
the parallel corpus.
[This, however, does not apply to archives or other collections, which can
contain texts in whatever languages. But then, an archive is not a corpus.]
Oliver
--
/\ \ lecturer | department of english | school of humanities
//\\ \ the university of birmingham | edgbaston | birmingham b15 2tt
\\// \ united kingdom | phone +44(0)121-414-6206 | fax +44(0)121-414-5668
\/ \ http://web.bham.ac.uk/o.mason/ | o.mason at bham.ac.uk
More information about the Corpora
mailing list