<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=koi8-r">
<META content="MSHTML 6.00.2900.3157" name=GENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=#ffffff>
<DIV><FONT face=Arial size=2>Dear Corpora colleagues, we take national languages
at the first step. So, English is taken in its Queen's English variant.
Unfortunately, our group of students is too small to embrace varieties of
English. However, it is hard to understand if this is a language or a dialect.
For instance, there are 4 dialects in Mansi (Vogul), but in fact they are
different languages since their native speaker do not understand each other.
Thus, we take only the Nortern dialect of Mansi because we have no time to
embrace all Mansi dialects (languages?). At the same time we take Russian,
Belorussian and Ukrainian as separate languages, though their sound pictures are
quite close and the communication is possible. However, the real problem is that
there are no phonetic corpora of Mansi, Hanty, Ket, Sel'kup,
Karelian, Hakas, Turkish, Azeri, Russian, Ukrainian, Belorussian and
the other world languages. This is why, we had to transcribe the texts ourselves
by hand. In future, however, it is advisable to set up phonetic corpora of
every dialect or variety of a language, first of all English, for learning
reasons as well. Thank you for your questions concerning our
project. Remain yours most sincerely Yuri Tambovtsev</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV></BODY></HTML>