VS: Articles on Uralic phylogenetics - a response

Wed Nov 6 13:21:15 UTC 2013

Dear readers of Ura-List,

and thanks for Florian for opening the discussion! Surely this discussion will continue – please feel free to contact us directly for further questions and ideas. Below, I’ve included Florian’s questions/comments, followed by my answers. ** is there to mark the change of the comment & answer to help reader to recognise the question from the answer.

**

     “As the authors of this project apparently want to generate discussion as they insisted on spreading their news on Ura-List, I take the opportunity to comment shortly on the Diachronica paper.—“

-

             That was indeed the aim. The papers are available in Diachronica and Journal of Evolutionary Biology, but I thought that not all have access to both the papers. We think it is important to share our findings because studying makes more sense when it also reaches the audience optimally. Part of our main audience is surely the readers of UraList. This way we can also get these valuable comments from the specialists of the field.

**

“Let me say in advance, that historical-comparative linguistics is not my main field of interest and this apparently won’t change in the future. However, as I have undergone the typical historical-comparative training of the discipline and spent a decade in a department infamously known for “revolutionary and post-revolutionary approaches to Uralic linguistics” until the retirement of its propagator, I have a hard time understanding the implications…”

-          Actually I can tell you all the Great Plan, which is not a secret at all: the Diachronica paper established the studies of “language evolution” for Uralic languages. The method and data is comparable to parallel studies on other language families, which means that there is now possibility to compare patterns of “language evolution” against other language families and collaborate internationally. Until now, international researchers have not had access to Uralic data. Further, Uralic literature has been difficult to utilize for the researchers speaking non-Uralic languages. We wanted to give a very short overview of the studies of Uralistics, BUT we encourage to write a thorough, up-dated review of studies in Uralistics – in English and in peer-reviewed journal. That information is needed, and I know that it would get a wide audience and lots of references!

-

-          So the first step was to write this paper that explained the data and the method and – importantly – showed that the results do not differ worryingly much from the previous results you would expect from lexical data. The second step, after the validation of the method, is using the data to actually study the macroevolutionary patterns of language evolution. (Macroevolution =speciation and extinctions). This was initiated in Honkola et al. We will continue to compare linguistic macroevolution to biological one (please feel free to co-operate or give suggestions or comments!). However, we will also continue to study the very basic questions asked from this kind of approach: How about network models, how about non-basic vocabulary etc. One of these papers is in peer review right now. = I do think that the implications can be large actually.
“My first open question is concerned with the nature of “phylogeny” propagated by these papers. Since when is language classification based exclusively on vocabulary and sound changes? Historically and theoretically, we are back in the 18th century again, perhaps with insights in sound changes deriving from the 20th century now reproduced by statistical and biological software…”

-          Luckily no-one was claiming in the paper that basic vocabulary would be the ultimate truth. In biology, people do phylogenetic research on the morphology of animals, Y-cromosome DNA, mitochondrial DNA, SNP’s…. All these approaches may produce different phylogenies (and often do). Further, if adding more species to the data, the structure of the tree may change. This is one of the advantages of this method and comparing different data sets is generally very straightforward. With different data and results we can try to solve the puzzle of Uralic history. You can also compare the data by asking how and when the results are similar or dissimilar. At the moment we are also collecting typological data so that we can compare it with lexical data. Surely any other data could be studied as well, and hopefully this will be done! By the way, there is one paper written where the language history was based on structure of the languages. This paper is blamed for not using vocabulary. Maybe the point is, that “one task at the time”!

**

‘”And then, why Swadesh?”

-

           We have to start somewhere. The Swadesh list is generally one of the most used basic vocabulary list in this kind of research, making it easier to compare to similar work on other languages. The paper also actually goes through TEN different sets of basic vocabulary meanings, TWO of which are Swadesh lists.

**

"Second, it is quite hilarious to read the following introductory statement: „Most Uralic research remains non-quantitative… (Diachronica p. 335). Some pages later however one reads that their data set contains a 100-item data set, a 200-item data set and a 500-word data set. Given that data for historical-comparative work is restricted, why is this new approach with 500 items any better and less „non-quantitative“? From the perspective of lexicography or corpus linguistics, 500 tokens is indeed “non-quantitative”.

This was good point to notice and we should have chosen different wording. We meant that the data was statistically (quantitatively) analyzed, and the classification was also produced quantitatively, in this case using Bayesian phylogenetics. We had altogether only 226 basic vocabulary meanings (not 500). We did comment on the size (quantity) of the vocabulary lists and ended up saying that the result is more or less the same with Swadesh 100, Leipzig-Jakarta (100 meanings) and Ura100 (our suggested basic vocabulary list for the Uralic langauges) or all these combined (226 items). It has been claimed that basic vocabulary is necessarily restricted to a small number of meanings, as when we start to add items more prone to replacement we aren’t really talking about BASIC vocabulary any more. This is why we tested how the result changes when more or less stable of the data are analysed. These tests are easy with statistical approaches, as you just need to separate an appropriate subset of the data and run the analyses again.

**

“Third, it is quite astonishing to see that output of researchers with a clear “revolutionary connotation” (Künnap & Taagepera 2004; Tambovtsev 2004) are even considered in such a paper. Apparently, the international reviewers have been unaware what happened in the discipline in the late 1990s and the first years of the new millennium and can’t tell solid scholarship from less solid. And by the way, so did the authors of this joint paper and their “linguistic” advisers for whom quite some space is reserved…”

-

            Luckily we remembered to underline that this was not an exhausting review of the literature. As stated above, a full, up-to-date review would be indeed needed! As stated in the text, the linguistics advisers checked through the basic vocabulary lists and are not (similarly to other people mentioned in the acknowledgements) in any way responsible of any of the text. In the future, we would be happy to get co-operation either in form of co-authorship or in reading through the ms before submission. If you would like to read the text but do not want you name to be mentioned anywhere near this approach, you can also remain an “anonymous referee”.

-          BTW, we got financing for a project to put the basic vocabulary lists on the Internet alongside the correlate (cognate) data. The idea will be to provide the data for others as well AND – this could be of interest to the readers – to allow the Uralists and Fenno-Ugrists to comment on them. This will surely improve the data as happened with Indo-European languages. This UraLex –project is indeed aiming at similar outcome as IE-Lex.

**

"Summing up the Diachronica paper, one sees a “scientific” reproduction of a number of “scholarly assembled facts” equaling earlier scholarship which was accused of having been based on a “non-quantitative sample”. After all, it is nice to see that “scholarly work” can indeed compete with a biological software data set analysis and one may be tempted to say that “scholarly work is indeed rather scientific”. Of course, the Diachronica paper is an instance of that kind of „science“ generally appreciated as “hard science” as the paper tests predictions based on a sampled data set and shows different models based on different analysis. But clearly, this paper does not show anything amazingly new; it “scientifically” reproduces data which has been assembled scholarly and comes to solutions which are not too diverging. So, all we got is „quod erum demonstrandum” now supported by software desgined by humans?

-

      As stated above, we were happy not to see anything completely unexpected. It would have been very difficult to continue if the results would have contrasted completely with earlier results… I hope that I already managed to show that this approach gives new possibilities to approach old and new questions.

**

“Finally, let me come back to my opening statement – in order to make such research interesting for a community of “scholars” (that’s how we are called by “scientists”), another central component of historical-comparative linguistics needs to be integrated – historical grammar. After all, genetic classification needs both lexicon and grammar. But then, historical grammar is messy, there is more analogy, leveling etc which blurs the nice and clear
cut lexicon and sound change picture. I wonder if this can be modeled and combined with the study one eagerly wanted to share with the community.”

-

      As stated above, this is what we are doing now. The good thing in Bayesian phylogenetics is that it actually tells you in different ways whether you should trust the classification or not. If the data is messy, it will be seen in the overall shape, the branch lengths, and in the posterior probabilities. And surely we will provide the ms for publication in international peer-reviewed journals - the future funding depends on number of publications. Also, it would be no science if you would not give you research for the fellow scholars/scientist to read and comment. Science is not about saying that THIS is the one and only truth. It is always said that typological data suggests a different kind of classification than basic vocabulary. I hardly can wait to see the result!

**

“Such a paper might indeed hold some surprises and would produce something new for the 21st century. As long as “phylogeny” is limited to vocabulary and sound change, the picture is incomplete and partial, even if it can be tested “scientifically”. After all, the genetic unity of Uralic (and any other language family) is indeed more than vocabulary and sound change…”

-          This is indeed what the financer thought when giving us money for collecting the typological list. However, I want to point out that language is also more than just grammar, and I would be hoping to see vocabulary data alongside grammatical data."

In general, we do not wish to suppress any other research approach in historical linguistics, but actually the opposite: We hope to lift up the research done on Uralic languages with this study field of “language evolution”. This new approach will most likely appeal to a new audience, and with this, the new audience will hopefully also find the exhaustive work that is done in the entire field of Uralic and Fenno-Ugric studies.

Terveisin,

Outi Vesakoski

________________________________
Lähettäjä: owner-ura-list at helsinki.fi [owner-ura-list at helsinki.fi] käyttäjän Florian Siegl [florian.siegl at gmx.net] puolesta
Lähetetty: 5. marraskuuta 2013 10:12
Vastaanottaja: ura-list at helsinki.fi
Aihe: Re: Articles on Uralic phylogenetics

As the authors of this project apparently want to generate discussion as they insisted on spreading their news on Ura-List, I take the opportunity to comment shortly on the Diachronica paper. Let me say in advance, that historical-comparative linguistics is not my main field of interest and this apparently won’t change in the future. However, as I have undergone the typical historical-comparative training of the discipline and spent a decade in a department infamously known for “revolutionary and post-revolutionary approaches to Uralic linguistics” until the retirement of its propagator, I have a hard time understanding the implications…
My first open question is concerned with the nature of “phylogeny” propagated by these papers. Since when is language classification based exclusively on vocabulary and sound changes? Historically and theoretically, we are back in the 18th century again, perhaps with insights in sound changes deriving from the 20th century now reproduced by statistical and biological software… And then, why Swadesh?
Second, it is quite hilarious to read the following introductory statement: „Most Uralic research remains non-quantitative… (Diachronica p. 335). Some pages later however one reads that their data set contains a 100-item data set, a 200-item data set and a 500-word data set. Given that data for historical-comparative work is restricted, why is this new approach with 500 items any better and less „non-quantitative“? From the perspective of lexicography or corpus linguistics, 500 tokens is indeed “non-quantitative”.
Third, it is quite astonishing to see that output of researchers with a clear “revolutionary connotation” (Künnap & Taagepera 2004; Tambovtsev 2004) are even considered in such a paper. Apparently, the international reviewers have been unaware what happened in the discipline in the late 1990s and the first years of the new millennium and can’t tell solid scholarship from less solid. And by the way, so did the authors of this joint paper and their “linguistic” advisers for whom quite some space is reserved…
Summing up the Diachronica paper, one sees a “scientific” reproduction of a number of “scholarly assembled facts” equaling earlier scholarship which was accused of having been based on a “non-quantitative sample”. After all, it is nice to see that “scholarly work” can indeed compete with a biological software data set analysis and one may be tempted to say that “scholarly work is indeed rather scientific”. Of course, the Diachronica paper is an instance of that kind of „science“ generally appreciated as “hard science” as the paper tests predictions based on a sampled data set and shows different models based on different analysis. But clearly, this paper does not show anything amazingly new; it “scientifically” reproduces data which has been assembled scholarly and comes to solutions which are not too diverging. So, all we got is „quod erum demonstrandum” now supported by software desgined by humans?
Finally, let me come back to my opening statement – in order to make such research interesting for a community of “scholars” (that’s how we are called by “scientists”), another central component of historical-comparative linguistics needs to be integrated – historical grammar. After all, genetic classification needs both lexicon and grammar. But then, historical grammar is messy, there is more analogy, leveling etc which blurs the nice and clear cut lexicon and sound change picture. I wonder if this can be modeled and combined with the study one eagerly wanted to share with the community. Such a paper might indeed hold some surprises and would produce something new for the 21st century. As long as “phylogeny” is limited to vocabulary and sound change, the picture is incomplete and partial, even if it can be tested “scientifically”. After all, the genetic unity of Uralic (and any other language family) is indeed more than vocabulary and sound change…

Florian Siegl

On 4.11.2013 18:13, Johanna Laakso wrote:
Dear All,

Outi Vesakoski of the BEDLAN project (http://kielievoluutio.uta.fi/ ) wanted to share these papers of their project with the URA-LIST community!

Best
JL
--
Univ.Prof. Dr. Johanna Laakso
Universität Wien, Institut für Europäische und Vergleichende Sprach- und Literaturwissenschaft (EVSL)
Abteilung Finno-Ugristik
Campus AAKH Spitalgasse 2-4 Hof 7
A-1090 Wien
johanna.laakso at univie.ac.at<mailto:johanna.laakso at univie.ac.at> • http://homepage.univie.ac.at/Johanna.Laakso/
Project ELDIA: http://www.eldia-project.org/

Välitetty viesti alkaa:

Lähettäjä: Outi Vesakoski <outves at utu.fi<mailto:outves at utu.fi>>
Aihe: uralilaista fylogenetiikkaa
Päivämäärä: 4. marraskuuta 2013 15.56.30 UTC+1.00
Vastaanottaja: Johanna Laakso <johanna.laakso at univie.ac.at<mailto:johanna.laakso at univie.ac.at>>

Hei.

Olisiko mielestäsi mahdollista ja asiallista laittaa oheiset artikkelit ura-listalle jakoon? Se voisi olla hyvä tapa saada kiinni varsin tärkeä osa juttujen lukijakuntaa! Kaikki eivät välttämättä pääse ainakaan Journal of Evolutionary Biologyyn, jos vaikka Diachronica olisikin kaikkien saatavilla.

En muista, että saanko itse kirjoittaa listalle ja laittaa liitteitä, mutta joka tapauksessa haluan ekaksi kysyä sinulta. Samalla tulen lähettäneeksi molemmat artikkelit suoraan sinullekin (tosin voi olla, että lähetin Terhin työn jo aikaisemminkin.)

T. Outi Vesakoski

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/ura-list/attachments/20131106/cf728bda/attachment.htm>