Translation databases?

John Myhill john at RESEARCH.HAIFA.AC.IL
Thu Dec 16 08:38:00 UTC 1999


This is terrific, Brian, but what I would really like to know is:
Are there any TRANSLATION databases? That is, databases which have both
originals and translations in other languages? Allowing for word searches
(construction searches would be even better, but this is too much to hope
for)?
In a variety of genetically unrelated and geographically separated languages?
If we want to do comparisons of functions of different structures, or
meanings of different word, in different languages, translations are really
helpful. For those of us who are seriously interested in language
universals, translation data, like nothing else, force us to come to grips
directly with differences between languages; we cannot, for example, so
well blather about the `universal' or `cognitive' functions of voice
alternations based exclusively on English data when confronted with
translation data clearly showing that other languages use voice
alternations in extremely different ways. I have applied for grants to
develop such a translation database twice and been rejected both times.
Wally Chafe tells me that the Pear Stories have never been rendered into a
usable form (and in any case they are quite short).  I have done a number
of studies using the Bible, because at least there are a lot of texts with
interlinear glosses in both languages, and there are concordances of
particular words--but there aren't so many languages with such data, and
Bible translations tend not to be into the most naturalistic language, if
you know what I mean. There are of course many texts with interlinear
glosses in, e.g. Native American or Australian languages and English, but
each of these is in only two languages, and there's no concordance to help
searches for individual words (in addition to difficulty in accessing
native speakers for help).

So, in order to get comparison between more than just two languages, I have
been forced to do things by hand. I am presently doing a study of the
comparative meanings of speech act verbs in Hebrew, English, Japanese, and
Spanish by using novels and short stories by Gabriel Garcia Marquez, A.B.
Yehoshua, and Banana Yoshimoto, with translations of each into each of the
other languages, and let me tell you, it is pretty slow going.  I have to
search for each occurrence of a given word, then search to see how it is
translated into each of the other three languages, without the use of
concordances or interlinear glosses (it goes without saying that I read
some of these languages more quickly that others).
If I have only like 5 occurrences of a given word, the translation data
often looks kind of chaotic, but if I can get 30 or 40 tokens, very clear
patterns always emerge, but unless the word is pretty common it simply
takes too long to get this number of tokens. I can do it, and the results
are very interesting, but it takes a long time, and I more or less have to
study only words which are pretty common (e.g., I would love to do a study
of emotion words like 'angry', 'sad', etc. to see how they're translated,
but it would take an enormous amount of time to locate enough tokens to
use--I've tried).

Any ideas or data sources which might speed things up?

Hopefully,  John



More information about the Funknet mailing list