[Corpora-List] RE: [Corpora-List] Parallel corpora and word alig nment, WAS: American and British English spelling c onverter

Santos Diana Diana.Santos at sintef.no
Thu Nov 16 14:19:59 UTC 2006


> >
> >     But it intrigued me to think of parallel corpora 
> *within* a language.
> >     I suppose dialectal texts rendered into "standard" language or
> >     vice versa
> >     might come close... I need to muse some more on this.
> >

Well, current parallel corpora -- even just bilingual -- may have different varieties of one language, which makes them also parallel corpora inside one language. For example, COMPARA, www.linguateca.pt/COMPARA/ has some translations of the same original texts into different varieties (of English, and of Portuguese).

I am sure that there are translation corpora (in the sense of having been compiled for the specific purpose of studying the translation process or result) that have multiple translations into the very same language -- and may feature different varieties. This might be a place to look into.

In any case, the point that a well designed (bilingual) parallel corpus can be used also as
- two monolingual corpora
- two corpora of several varieties 
- two comparable corpora
- two translation corpora
and so on...
has long been made by Stig Johansson when presenting the ENPC in the 90s. See e.g.

Johansson, Stig. "On the role of corpora in cross-linguistic research", in S. Johansson and S. Oksefjell (eds.), Corpora and cross-linguistic research: theory, method, and case studies, Amsterdam: Rodopi, pp.3-24.

It depends on the goal of your studies, of course, how much text you require for comparing varieties, but there are some corpora at least already with the potential for that. You have anyway to be careful that not all differences (for example in independently created translations) are due simply to differences in variety: after all, two different translators are two different creators, but some of the differences may be related to the variety they speak/write.

Best,
Diana
---------------
Diana Santos
www.linguateca.pt
Pólo de Oslo da Linguateca, SINTEF ICT
Pb 124 Blindern, N-0314 Oslo, Noruega



More information about the Corpora mailing list