[Corpora-List] our corpora on world languages

Yuri Tambovtsev yutamb at mail.cis.ru
Thu Jul 3 17:34:08 UTC 2003


Dear colleagues, I am sending you all this information in case you
could publish it in your electronic newsletter. I do hope we might establish
a joint project. I'd like to tell you about our group of Phonostatistics
and Typological Studies. It would be very kind of You to let
me know about Your activities in the field of phonostatistics and
typology in the West. I planned to attend the conferences in the West 
(for instance in Prague)to renew my contacts or to set up new ones. 
Actually, now that democracy came to Russia, it is harder to travel to 
the West from Novosibirsk than before, since the transportation cost
more, than before, when every post-graduate student could pay his 
ticket to go to Moscow. Now a Novosibirsk linguist cannot find enough
money to go even to Moscow. I failed to find a bursary for my trip to 
Prague as well as any other conference in the West.
This is why your e-mail infromation is of great interest and importance
to us. In fact, e-mail is the only contact with the colleagues in 
the profession.
If You happen to inform us about some international conferences on
phonostatistics, we'd be most grateful. Please,be so kind as to let us
know. Our group of phonological studies of Siberian, Paleo-Asiatic,
Uralo-Altaic, Far East, Oceanian languages and some isolated languages
(Korean, Nivh, Ket, Yukaghir, Japanese) is looking
forward to establishing close contacts with all the world
colleagues in these fields of linguistics: typology and 
phonostatistics.Many articles on Siberian, Finno-Ugric, Turkic, 
Mongolian, Tungus-Manchurian and Paleo-Asiatic 
languages could be published on our data. Now our small group is 
working on the texts
of the 112th language of the world: Dolgan. We have computed the following world
languages:1. Japanese; 2.Nivh; 3.Ket; (Finno-Ugric): 
4.Mansi(Vogul):Sygva, Sosva, and Konda dialects; 5.Hanty(Osjak): Kazym and Eastern 
dialects; 6. Hungarian; 7.Komi-Zyrian; 8.Udmurt (Votiak); 9. Mari (Che-
remis): Mountain and Lawn dialects; 10 Mordovian: Erzia and Moksha; 
11 Vepsian; 12. Vodian; 13. Karelian: Tihvin, Livvikov and Ljudikov;
14. Saami (Lopari); 15. Finnish; (Samoyedic):16. Nganasan; (Turkic):
17. Azeri (Azerbaidjanian); 18. Tatar: Sibirian-Baraba and Kazan; 
19. Altai (Kizhi);20. Kumandin(Altai); 21.Turkish; 22. Turkmen; 
23. Jakut(Saha); 24.Karakalpak; 25.Kazah; 26. Kirgiz; 27. Tofalar;
28.Shorian; 29. Dolganian; 30.Hakas; 31.Ujgur; 32.Uzbek; (Tungus-
Manchurian): 33.Nanai; 34. Negidal; 35. Evenk (Tungus); 36.Even;
37. Uljch; 38. Orok; 39. Oroch; 40. Nivh; (Mongolian): 41. Mongolian;
42.Buriatian; 43. Kalmykian; (Slavonic): 44.Russian; 45. Ukrainian;
46. Belorussian; 47. Sorbian; 48. Serbo-Croatian; (Iranian): 
49. Gilian; 50. Persian (Iranian); 51. Tadjikian; 52. Pushto; 
(Paleo-Asiatic): 53. Iteljmen (Kamchadal); 54. Chuckchian; 55. Jukagir;
56. Eskimo:Siberian and American; 57. Arabic; 58. Mangarayi (Aboriginal 
Australian); 57) Korean and many others - 111 all in all. Many of 
these languages are endagered. I'm sure it is high time to establish 
the corpora for the endagered languages. I wonder what the world linguists
think about this idea. Should the corpora for the endangered languages
be created? Or should it not? Is it important or should we forget about
this idea, since it is not important at all? Our main goal, though,
is to find out the universal characteristics of the sound pictures of 
world languages and to calculate the phonological distances 
on the basis of the frequency of occurrence of phonemes and phonemic 
groups. Then we plan to publish the word frequency dictionaries of the
languages mentioned above. As a matter of fact,many of these languages 
are still on the old punch-cards, but we are transfering them on PC diskettes.Many 
of the texts (e.g. Japanese,Persian,Arabic, Hebrew, Korean, etc.) are fed in the form 
of phonological transcription. We could exchange some of the material 
in the electronic form. We'd be also happy to work together on
some joint project with linguists all over the world.
Yuri Tambovtsev, Novosibirsk, Russia. E-mail address:
  yutamb at hotmail.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20030703/d08d6cf2/attachment.htm>


More information about the Corpora mailing list