[Lingtyp] Greenbergian word order universals: confirmed after all

Gerhard Jäger gerhard.jaeger at uni-tuebingen.de
Mon Nov 6 08:32:49 UTC 2023


Stratified sampling can only tell you something about the equilibrium 
distribution. If you have many data from related languages - ideally 
from several families and areas, you can also infer information about 
diachronic pathways.

In your case, I would advice to sample 50 isolates or languages from 
small families, and the remaining 50 from five larger families.

The results from 1500 languages from 100 families would evidently be 
much more reliable though. Sampling one language from a family leads to 
a larger sampling error in comparison to having many languages from the 
same family.

It shouldn't be too hard to do a simulation study to clarify this once 
and for all.

Best, Gerhard

On 11/6/23 09:24, Martin Haspelmath wrote:
> Many thanks, Gerhard, for these clarifications. Clearly I didn't 
> understand your article well enough, as I lack the mathematical 
> background. But I was glad to see your clarification about the 
> relationship between your method and stratified sampling:
>
> On 06.11.23 08:23, Gerhard Jäger wrote:
>> In the extreme case where each family contains just one language in a 
>> sample, the Jäger & Wahle method is actually equivalent to stratified 
>> sampling where one language is sampled from each family. So our 
>> method is not so much an alternative to stratified sampling but an 
>> extension that allows to use all languages for which you have data.
>
> So does the advantage boil down, then, to situations where a lot of 
> extra data is available, as with WALS (which was used by Jäger & 
> Wahle) and Grambank (used by Verkerk et al.)?
>
> Suppose I want to study a phenomenon that no large-scale worldwide 
> research has been done on yet, e.g. concessive conditional clauses 
> (Tom Bossuyt's recent article goes beyond Europe, building on 
> Haspelmath & König 1998, but covers only 17 non-European languages: 
> https://benjamins.com/catalog/sl.20068.bos).
>
> So if I have funding only for studying a hundred languages, could I 
> study 100 languages from 100 different Glottolog families and get 
> results that would be about as good as studying 1500 languages from 
> 100 families? The research would be at least 15 times cheaper, so 
> funding agencies might be very interested in the answer to this question.
>
> I've often seen the argument that we shouldn't "throw away data", and 
> that the sampling method forces us to do that. But that argument 
> applies only to situations where a lot of data is already available. 
> In the case of concessive conditionals, if we decided to collect data 
> on 1500 languages, we might be "throwing away money", because very 
> similar results could perhaps be obtained much more cheaply.
>
> Best,
>
> Martin
>

-- 
Prof. Dr. Gerhard Jäger
Universität Tübingen
Seminar für Sprachwissenschaft
Tel.: +49-7071-29-77302
http://www.sfs.uni-tuebingen.de/~gjaeger/



More information about the Lingtyp mailing list