Languages considered in typologcial research

Peter Arkadiev peterarkadiev at YANDEX.RU
Sun Dec 4 09:22:12 UTC 2011


Dear all,

I would like to thank Wolfgang for raising this point and Martin for posting the WALS statistics, but I am not sure that what Martin has done actually answers Wolfgang's question. For a language to be well-represented in WALS it needs to have a comprehensive grammar and/or probably a number of papers written by experts and dealing just with this language. Representation of the language in typological studies is something different, i.e. its being included into samples compiled by different authors working on different topics, use of its data in typologically-oriented discussions of various phenomena, and, last but not least, use of its data in more general typologically-oriented reference works such as handbooks or textbooks.
All this is intended to say that WALS is actually not an adequate database for Wolfgang's question and that in order to answer it adequately a special investigation of literature is needed. 
Such an investigation can be facilitated by the fact that many modern typological works exist in searchable electronic form and that books are normally accompanied by language indices. It could be possible to compute for a selection of languages (e.g. for those from the "top-WALS") the following figures: 
1) the number of books (from a preestablished representative set) whose indices contain the name of a language;
2) the number of mentions of a language in each book.
Of course, different methods of estimating language representation can be used. Journal articles should also be included, of course. And of course, it makes a difference whether the language is just mentioned or whether an example from this language is provided.

Best wishes,

Peter




04.12.2011, 10:44, "Martin Haspelmath" <haspelmath at EVA.MPG.DE>:
> To do what Wolfgang Schulze is asking for, one would simply need
>
> (i) a representative selection of "the typological literature"
> (ii) a database that records every language that is dealt with in each of these works
>
> As Joseph Farquarson notes, the World Atlas of Language Structures (http://wals.info) comes close to this: It was intended to be representative of large-scale typological work, and the WALS database makes it easy to compute this ranking. See the list below (the top 113 languages out of WALS's 2560 languages). (This is somewhat out of date, because it's based on the 2005/2008 edition, not the 2011 edition, but the trend is clear.)
>
> There are at least three important caveats here:
>
> (i) WALS is not representative of typological work as a whole – only of large-scale typological work, covering more than 150 languages. Much typological work discusses grammatical features at a depth that is not possible with such large numbers of languages, because the relevant information cannot be easily found in reference grammars.
>
> (ii) The WALS authors were explicitly given the instruction to try to cover a core sample of 100 or 200 languages, so that the number of languages treated in most chapters would be maximized (see http://wals.info/languoid/samples/200). So the top 200 languages of the WALS Language Coverage List are probably largely due to this instruction.
>
> (iii) As Silvia Kouwenberg notes, pidgins and creoles are not well represented in WALS. This is because WALS is an atlas, and it was intended first and foremost as a way of showing areal patterns. Languages that arose due to long-distance migration over the last few centuries (including languages such as Brazilian Portuguese or Surinamese Hindustani) would confuse this areal pictures, so it was (controversially) decided not to encourage their inclusion in WALS.
>
> Thus, we would need a more comprehensive database that does not show these idiosyncrasies. Colin Masica is "surprised that this hasn't been done", but this is not surprising at all – it would be quite difficult to get funding for such an enterprise.
>
> Greetings,
> Martin
>
> Am 12/3/11 6:42 PM, schrieb Wolfgang Schulze: > Dear friends,
>> just a short (and maybe silly?) question: Is anybody aware of some kind of statistics that considers to which extent the individual languages of the world are dealt with in the typological literature? It would be interesting to see where (and why !) there are both lacunae and statistic 'peaks'. The issue could be refined if one includes the classical linguistics domains such as phonology, morphology, syntax, semantics etc. Such a "World Atlas of Linguistics Data" (just to give it a name) would not only help motivating researchers to fill up lacunae, but also help understanding what the reasons may be for certain preferences...
>> Best wishes,
>> Wolfgang
>
> WALS Langage Coverage:
>
> Languages in WALS and number of WALS features in which the language is considered
> (only languages that occur in more than 100 features out of 141)
>
> English 139
>
> French 136
>
> Finnish 135
>
> Russian 135
>
> Spanish 135
>
> Turkish 135
>
> Hungarian 133
>
> Indonesian 133
>
> Japanese 130
>
> Mandarin 130
>
> Amele 129
>
> German 129
>
> Greek (Modern) 129
>
> Lezgian 129
>
> Abkhaz 128
>
> Evenki 128
>
> Korean 128
>
> Persian 128
>
> Basque 127
>
> Hausa 126
>
> Maori 126
>
> Georgian 125
>
> Kannada 125
>
> Khalkha 125
>
> Malagasy 125
>
> Supyire 125
>
> Hindi 124
>
> Tagalog 124
>
> Arabic (Egyptian) 123
>
> Greenlandic (West) 123
>
> Hixkaryana 123
>
> Swahili 123
>
> Vietnamese 123
>
> Slave 122
>
> Burushaski 121
>
> Chamorro 121
>
> Chukchi 121
>
> Fijian 121
>
> Hebrew (Modern) 121
>
> Lango 121
>
> Oromo (Harar) 121
>
> Thai 121
>
> Yaqui 121
>
> Zulu 121
>
> Maybrat 120
>
> Tukang Besi 120
>
> Kanuri 119
>
> Kayardild 119
>
> Mapudungun 119
>
> Yoruba 119
>
> Yukaghir (Kolyma) 119
>
> Burmese 118
>
> Krongo 118
>
> Mangarrayi 118
>
> Tiwi 118
>
> Guaraní 117
>
> Khoekhoe 117
>
> Meithei 117
>
> Ngiyambaa 117
>
> Ainu 115
>
> Jakaltek 115
>
> Lakhota 115
>
> Martuthunira 115
>
> Pirahã 115
>
> Wari' 115
>
> Lavukaleve 114
>
> Rapanui 114
>
> Alamblak 113
>
> Gooniyandi 113
>
> Kutenai 113
>
> Mixtec (Chalcatongo) 113
>
> Awa Pit 112
>
> Kobon 112
>
> Latvian 112
>
> Maricopa 112
>
> Imonda 111
>
> Apurinã 110
>
> Berber (Middle Atlas) 110
>
> Warao 110
>
> Canela-Krahô 109
>
> Nivkh 109
>
> Quechua (Imbabura) 108
>
> Rama 108
>
> Wichí 108
>
> Yagua 108
>
> Koromfe 107
>
> Bagirmi 106
>
> Hunzib 106
>
> Ingush 106
>
> Maung 106
>
> Epena Pedee 105
>
> Ket 105
>
> Koasati 105
>
> Luvale 105
>
> Sango 105
>
> Iraqw 104
>
> Kewa 104
>
> Sanuma 104
>
> Shipibo-Konibo 104
>
> Ju|'hoan 103
>
> Kilivila 103
>
> Nunggubuyu 103
>
> Asmat 102
>
> Ewe 102
>
> Grebo 102
>
> Hmong Njua 102
>
> Khasi 102
>
> Khmer 102
>
> Kiowa 102
>
> Ndyuka 102
>
> Wichita 102
>
> Arapesh 101
>
> Oneida 101

-- 
Peter Arkadiev, PhD
Institute of Slavic Studies
Russian Academy of Sciences 
Leninsky prospekt 32-A 119334 Moscow
peterarkadiev at yandex.ru
http://www.inslav.ru/index.php?option=com_content&view=article&id=279



More information about the Lingtyp mailing list