[Corpora-List] Parallel / Comparable / Translation

oliver at ccl.bham.ac.uk oliver at ccl.bham.ac.uk
Thu Sep 12 13:39:02 UTC 2002


> I admit that the term "translation corpus" is confusing: you would rather
> understand it as a "corpus of translations" than "corpus for translators" or
> "used mainly by translators" (which is the right interpretation).

I don't think there is a need for terms describing a corpus according to
the expected users; who cares whether a corpus is used by translators,
language teachers, or computational linguists?

If it is "corpus of translations" then it is either
- a "translation corpus" if it contains texts in one language and their
  translations in one (or possibly more) other languages.  This could
  be viewed as a subtype of a parallel corpus, which doesn't have the
  requirement that its elements are translations of each other.
OR
- a corpus consisting of texts in one language, which are translations
  of some other texts (which are not in the corpus).  This would be a
  specialised sample of a monolingual corpus, similar in principle to a
  corpus of newspaper articles, or some other externally specified text
  type/genre/...

It doesn't make sense from a technical point of view to have a `mixed
bag' of texts in different languages in one single corpus `lump', unless
they're separate parts (as in a parallel/translation/comparable corpus).
So, a corpus containing elements in more than one language should really
only be either parallel or comparable, or should be a translation corpus
if it is retained as an independent category and not just a subtype of
the parallel corpus.

[This, however, does not apply to archives or other collections, which can
contain texts in whatever languages.  But then, an archive is not a corpus.]

Oliver

--
 /\  \ lecturer | department of english | school of humanities
//\\  \ the university of birmingham | edgbaston | birmingham b15 2tt
\\//   \ united kingdom | phone +44(0)121-414-6206 | fax +44(0)121-414-5668
 \/     \ http://web.bham.ac.uk/o.mason/ | o.mason at bham.ac.uk



More information about the Corpora mailing list