[Corpora-List] our corpora on world languages
Yuri Tambovtsev
yutamb at mail.cis.ru
Thu Jul 3 17:34:08 UTC 2003
Dear colleagues, I am sending you all this information in case you
could publish it in your electronic newsletter. I do hope we might establish
a joint project. I'd like to tell you about our group of Phonostatistics
and Typological Studies. It would be very kind of You to let
me know about Your activities in the field of phonostatistics and
typology in the West. I planned to attend the conferences in the West
(for instance in Prague)to renew my contacts or to set up new ones.
Actually, now that democracy came to Russia, it is harder to travel to
the West from Novosibirsk than before, since the transportation cost
more, than before, when every post-graduate student could pay his
ticket to go to Moscow. Now a Novosibirsk linguist cannot find enough
money to go even to Moscow. I failed to find a bursary for my trip to
Prague as well as any other conference in the West.
This is why your e-mail infromation is of great interest and importance
to us. In fact, e-mail is the only contact with the colleagues in
the profession.
If You happen to inform us about some international conferences on
phonostatistics, we'd be most grateful. Please,be so kind as to let us
know. Our group of phonological studies of Siberian, Paleo-Asiatic,
Uralo-Altaic, Far East, Oceanian languages and some isolated languages
(Korean, Nivh, Ket, Yukaghir, Japanese) is looking
forward to establishing close contacts with all the world
colleagues in these fields of linguistics: typology and
phonostatistics.Many articles on Siberian, Finno-Ugric, Turkic,
Mongolian, Tungus-Manchurian and Paleo-Asiatic
languages could be published on our data. Now our small group is
working on the texts
of the 112th language of the world: Dolgan. We have computed the following world
languages:1. Japanese; 2.Nivh; 3.Ket; (Finno-Ugric):
4.Mansi(Vogul):Sygva, Sosva, and Konda dialects; 5.Hanty(Osjak): Kazym and Eastern
dialects; 6. Hungarian; 7.Komi-Zyrian; 8.Udmurt (Votiak); 9. Mari (Che-
remis): Mountain and Lawn dialects; 10 Mordovian: Erzia and Moksha;
11 Vepsian; 12. Vodian; 13. Karelian: Tihvin, Livvikov and Ljudikov;
14. Saami (Lopari); 15. Finnish; (Samoyedic):16. Nganasan; (Turkic):
17. Azeri (Azerbaidjanian); 18. Tatar: Sibirian-Baraba and Kazan;
19. Altai (Kizhi);20. Kumandin(Altai); 21.Turkish; 22. Turkmen;
23. Jakut(Saha); 24.Karakalpak; 25.Kazah; 26. Kirgiz; 27. Tofalar;
28.Shorian; 29. Dolganian; 30.Hakas; 31.Ujgur; 32.Uzbek; (Tungus-
Manchurian): 33.Nanai; 34. Negidal; 35. Evenk (Tungus); 36.Even;
37. Uljch; 38. Orok; 39. Oroch; 40. Nivh; (Mongolian): 41. Mongolian;
42.Buriatian; 43. Kalmykian; (Slavonic): 44.Russian; 45. Ukrainian;
46. Belorussian; 47. Sorbian; 48. Serbo-Croatian; (Iranian):
49. Gilian; 50. Persian (Iranian); 51. Tadjikian; 52. Pushto;
(Paleo-Asiatic): 53. Iteljmen (Kamchadal); 54. Chuckchian; 55. Jukagir;
56. Eskimo:Siberian and American; 57. Arabic; 58. Mangarayi (Aboriginal
Australian); 57) Korean and many others - 111 all in all. Many of
these languages are endagered. I'm sure it is high time to establish
the corpora for the endagered languages. I wonder what the world linguists
think about this idea. Should the corpora for the endangered languages
be created? Or should it not? Is it important or should we forget about
this idea, since it is not important at all? Our main goal, though,
is to find out the universal characteristics of the sound pictures of
world languages and to calculate the phonological distances
on the basis of the frequency of occurrence of phonemes and phonemic
groups. Then we plan to publish the word frequency dictionaries of the
languages mentioned above. As a matter of fact,many of these languages
are still on the old punch-cards, but we are transfering them on PC diskettes.Many
of the texts (e.g. Japanese,Persian,Arabic, Hebrew, Korean, etc.) are fed in the form
of phonological transcription. We could exchange some of the material
in the electronic form. We'd be also happy to work together on
some joint project with linguists all over the world.
Yuri Tambovtsev, Novosibirsk, Russia. E-mail address:
yutamb at hotmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20030703/d08d6cf2/attachment.htm>
More information about the Corpora
mailing list