[Corpora-List] corpora and new language classifications

Yuri Tambovtsev yutamb at mail.cis.ru
Sat Jul 12 11:00:31 UTC 2003


Corpora and New language classification of Uralic languages.
The main problem in constructing corpora is the problem of classification of this 
or that sort. Actually, the problem of classification may be called the aim of 
linguistics in general. A linguist must classify sounds, phonemes, words, 
sentences, meanings, etc., etc. Nevertheless, the most important problem in 
linguistics may be classification of 6000 world languages and dialects into 
subgroups, groups, families, super-families, filia, etc. However, the main 
language families were constructed long ago and some of them need 
reconstructing. I'm sure it is one of the hardest jobs in linguistics to reconsider 
accepted classifications for many reasons.  I heard that such an attempt of this 
hard and dangerous job has been made by Dr. Angela Marcantonio of Rome 
university, who tried to reconsider the Uralic language family in her recent book 
(The Uralic Language Family. Facts, Myths and Statistics.- Oxford UK and 
Boston USA: Blackwell Publishers, 2002, 335 pages). I wish I could read it, but 
it is not available in Novosibirsk, Russia. The Uralic language family is said to 
consist of the Finno-Ugric and Samoyedic languages. I can guess that the Uralic 
language family may be not a real family, but a conglomerate of Finnic, Ugric 
and Samoyedic languages. My phonostatistical data on this language group 
makes me believe that one should be very cautious when talking about the Uralic 
languages as one family. Consequenntly, the values of the coefficient of variation 
of 8 consonantal groups (labial, front, palatal, velar, sonorant, occlusive, fricative 
and voiced) SHOW THAT ITS BODY IS RATHER DISPERCE, i.e. not 
compact. The fact is, that this group is less compact than other language families. 
Let us compare the coefficients of variance of several language families:
Uralic -        28.31% 
Mongolic - 10.78%
Samoyed  - 18.29%
Turkic      - 18.77%
Finno-Ugric - 24.14%
Altaic - 25.97
Therefore, one can see that the Uralic group of languages is not as compact as 
Finno-Ugric or Samoyedic, which are its part. It is 2 times less compact than 
Mongolic language family. One can find the details of the compactness of other 
language groups in my recent book (Yuri A. Tambovtsev. The Typology of 
Functioning of Phonemes in the Sound Chain of Indo-European, Paleo-Asiatic, 
Ural-Altaic, and Other World Languages: the compactness of Groups, Families 
and the other Language Taxons. - Novosibirsk: SN Institute, 2003. - 143 pages. In Russian).
I wonder if I may ask my colleagues in the field of linguistics to share their 
opinion on the book of Dr. Angela Marcantonio. Should we reconsider the 
commonly accepted language families? If so, on the basis of what data and what 
methods? Looking forward to hearing from you soon to yutamb at hotmail.com Yours 
sincerely Yuri Tambovtsev, Novosibirsk, Russia 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20030712/0909dcdd/attachment.htm>


More information about the Corpora mailing list