[Lingtyp] A "Swadesh List" of Ideophone semantic categories

I am sorry for my mis-spelling your name, Bernhard!


Many thanks, Bernard, for your rich discussion of several important issues in language typology! Here are some thoughts about a few of the points made.

- Regarding (iv) Ignoring language-internal variation engenders spurious correlations and wrong explanations
Yes, crosslingustic correlations may be spurious; but they may also turn out to be significant. These statements are not meant to provide causal explanations for grammatical patterns; they are simply observations about the distribution of grammatical patterns based on given samples. It is worth formulating these statements as data to be evaluated for their validity and to be explained.

It still seems to me that individual linguists may legitimately opt for crosslinguistic comparative studies without also considering language-internal variation. The validity of these comparisons may subsequently be tempered or entirely destroyed when we probe into variation within the languages that have been sampled.

- Regarding  (iii) Typological features are the probability of occurrence of events under functionally strictly defined utterance conditions (or simplified: the privileged occurrence of events under certain conditions), and
                      (i) Cross-linguistic comparative concepts are reifications
Yes,  utterances are events rather than objects even though in our analyses we reify them as we talk about them as "things". Bernard sees reifications as dangerous: "they can be used for our communication even if they lack appropriate referents in the real world".  I think the danger is lifted if we clarify what we mean by referring to things in our analyses. If we claim that they actually exist, this would be mistaken since we do not know. If, however, we make it clear that they are simply conceptual tools of analysis without any existential claims involved, there is no problem. Physicist Niels Bohr said: "It is wrong to think that the task of physics is to find out how nature IS. Physics concerns only what we can SAY about nature." (Cited in Bruce Gregory: Inventing reality. Physics as language. 1990. New York, John Wiley and Sons, p. 95.)  The same can be - in fact, I think must be - said about linguistics as well: our metalanguage is an analytic tool that we invent and its relationship to reality is unknown.
Bernard also wrote: "The number of potential cross-linguistic comparative concepts is probably very large, perhaps infinite. Has any linguist the right to "create" any of those cross-linguistic comparative concepts if only it is assured that s/he does not compare apples to pears? Who decides which of these very many potential cross-linguistic comparative concepts we should work with if none of them has any self-evident essence?"
      I cannot see any danger involved. Crosslingustic comparisons may be based on any concept conceived by any of us without incurring any negative outcomes (what might they possibly be?). Any comparison will yield some results; the sole criterion for the usefulness of one of these concepts is empirical - i.e., whether it facilitates a generalization.  There is no a priori way to designate some comparative concepts as legitimate and others as non-legitimate and, as Bernard implies, we do not need to stick with one particular definition of a concept over another since they may serve different purposes.  As long as we hypothesize a crosslinguistic pattern by clearly stating the conditions under which we take it to hold or not hold in a language, I cannot see how we can go wrong.
-Regarding (ii) Existential statements are problematic
Yes, in everyday discourse, statements like There are palm trees in Iceland or There are ideophones in Lithuanian are interpreted as palm trees being typical of Iceland and ideophones typical of Lithuanian. If we want our statements to be taken in their logical sense, where mere occurrence rather than typical occurrence is meant, we need to state this.
Edith Moravcsik

Dear Martin, dear Edith,

> I agree that the issue of variation is orthogonal

No, this is not at all orthogonal. Here is some further argument why:
(i) Cross-linguistic comparative concepts are reifications.
(ii) Existential statements are problematic (both in general and, in typology, in particular).
(iii) Typological features are not the existence of things/objects/organisms in a territory, but the probability of occurrence of events under certain conditions (where the variable "language" actually means "uttered by a native speaker of that language" under specific discourse conditions)
(iv) Ignoring language-internal variation engenders spurious correlations and wrong explanations.

I will elaborate on these points in inverse order.

(iv) Ignoring language-internal variation engenders spurious correlations and wrong explanations
Most typologists (and other linguists) are interested in explanation. In order to establish explanations of the kind A is caused by B, a first tentative step can be to establish statistically valid correlations between A and B. But correlations are often spurious if not all potential factors are known (https://en.wikipedia.org/wiki/Spurious_relationship). There is probably a correlation between the existence of palm trees and ice cream sales, but this does not mean that ice cream sales explain the existence of palm trees (alas, palm oil as ingredient in ice cream complicates the matter). Correlating a classification of languages with another linguistic or extra-linguistic feature is particularly problematic when "language" is not the only or not the most important predicting variable. Take the case of absolute and relative frames of reference adduced as an argument for linguistic relativity. Pederson (1993) shows that "Urban Tamils, like European culture, use NSEW exclusively with large-scale or geographic space. In stark contrast with this, rural Tamils use absolute NSEW to depict their manipulable space". If there are many languages like Tamil, this might mean that the use of frames of reference in non-linguistic tasks is not or not only conditioned by the native language of the speakers, but rather by non-linguistic factors (e.g., predominantly urban vs. rural culture).
Ignoring whether ideophones are subject to language-internal variation, for instance, of the urban-rural kind (as is BTW frequently reported in the literature), makes it pretty pointless to consider candidates for language-internal explanatory factors for their occurrence.
Thus, language-internal variation is not a luxury some particularly freaky typologists may waste their time on if they really want to. It is something that everybody who is interested in explanation in language has to take into account.

Pederson, Eric. 1993. "Geographic and manipulable space in two Tamil linguistic systems." In European Conference on Spatial Information Theory, pp. 294-311. Springer, Berlin, Heidelberg.

(iii) Typological features are the probability of occurrence of events under functionally strictly defined utterance conditions (or simplified: the privileged occurrence of events under certain conditions)
The written and visual representation of glossed examples in reference grammars distracts us from the fact that utterances and everything within them are events rather than objects or organisms (such as palm trees). Quantitative linguists are well-aware of this. Consider, e.g. Harald Baayen's important point that language is a large-number-of-rare-events phenomenon (Baayen 2008: 228). Mentalists reify these events by modelling mental representation in static terms, but I am quite confident that progress in brain research will increasingly demonstrate the procedural character of language in the mind. To the extent comparative linguistics (historical, typological or psycholinguistic) increasingly influences field linguists, they use more and more pseudo-experimental tasks for data collection which makes utterances more comparable cross-linguistically. The Swadesh lists are examples of such pseudo-experimental tasks. They elicit the most probable terms with which native speakers of a language X express a certain well-defined concept. Swadesh lists are not about existence of certain forms, but about privileged occurrence of forms. What is important here is that there is the same procedure for data collection with the variable "different speakers with different native languages". Ideally there is the same number of chances to score for every language. As in the example with pronominalized proper names in bible translations I posted yesterday. For all language varieties considered there are 10 instances of appropriate discourse conditions for triggering utterances such as "I, Tertius, the one who wrote this letter, greet you". So the procedure of data collection for each language variety is roughly the same and, what is more important, occurrence of the event is treated in the same way as non-occurrence of the event (irrespective of whether the event might occur with an infinite number of trials). Non-occurrence matters as much as occurrence.
Using reference grammars for collecting data in typological databases is just an attempt to approximate privileged occurrence of events. Functional domains are rough approximations for semantically and pragmatically determining the right kind of utterance conditions. Typologists often preferably rely on examples in grammars which come closer to functionally determined utterance conditions than abstract grammatical descriptions.

Baayen, R. Harald 2008. Analyzing Linguistic Data. A practical introduction to statistics using R. Cambridge: Cambridge University Press.

(ii) Existential statements are problematic
There has been a lot of debate in philosophy about existential statements, especially negative ones. The most famous instance is maybe Wittgenstein's and Russell's dispute about "There is no rhinoceros in the lecture room" (https://www.independent.co.uk/life-style/when-ludwig-wittgenstein-met-bertrand-russell-1596995.html). Leaving philosophy aside, we cannot neglect that positive and negative existential statements are often used in different ways. Yesterday, I read about plans to plant and sustain two palm trees in Iceland, I have no idea how far that project has advanced, but suppose there are two palm trees in Iceland (artificially planted and constantly sheltered from the Icelandic climate). Then neither (1) nor (2) will be really accurate statements:
(1) There are palm trees in Iceland.
(2) There are no palm trees in Iceland.
The reason seems to be that negative existential statements are stricter concerning the exclusion of existence of any individual whatsoever, whereas positive existential statements are rather about what is typical or common.
Whereas occurrence and non-occurrence are on the same level making them appropriate for statistics (see above), non-existence and existence are not (or, at least, this cannot be taken for granted).
If, however, positive existential statements are interpreted strictly, they are not particularly useful anymore for determining what is common. Put differently, the statement There are palm trees in Iceland may then be true, but it will not in any way ascribe a typical property to the country. Typological features in typological classifications and databases, however, are usually understood as typical properties of that language (why else the name "linguistic typology"). If existence of a single exemplar is all that matters for a typological feature, this entails that we end up with typological features that cannot be assumed to be representative for a language in any way.
This is what I meant with that many statements about typological features (especially in typological databases) to the extent features are conceived of as discrete/binary are not falsifiable. They are in the no man's land between absolute absence and common occurrence. The no man's land between absolute absence and typical/frequent occurrence is very large for many features, except for those with strong bimodal distribution, such as the word order typologies modern typology has started with. The issue becomes more urgent to the extent ever more features are considered in typology.
Add to this that the absence of negative existential statements in reference grammars (with the exception of Martin's Lezgian grammar) is one of the most severe problems of reference grammars as data sources. There is good reasons that grammar writers do not want to make such claims. Negative existential statements are very strong claims.

(i) Cross-linguistic comparative concepts are reifications
Lucky were the days of UG, where linguists did not have to doubt about the existence of discrete features. (Some did already in those days, for an early criticism of essentialist methodology in linguistics see Altmann and Lehfeldt 1971: 20-22.) What I tried to make clear to Martin in my previous posting was that by abandoning the essentialism of cross-linguistic features, it is not at all clear anymore whether cross-linguistic features really exist and whether discrete features are the best way to capture them. (See, in particular, Croft 2001 about the non-existence of cross-linguistic constructions and Croft's suggestion to work with conceptual space instead.) The number of potential cross-linguistic comparative concepts is probably very large, perhaps infinite. Has any linguist the right to "create" any of those cross-linguistic comparative concepts if only it is assured that s/he does not compare apples to pears? Who decides which of these very many potential cross-linguistic comparative concepts we should work with if none of them has any self-evident essence?
The term for creating things which do not have any self-evident essence is reification, which is probably a good example of itself, because reification is rather a practice than a thing. This is well described by Wenger (1998) within the framework of communities of practice. "Any community of practice produces abstractions, tools, symbols, stories, terms, and concepts that reify something of that practice in a congealed form" (Wenger 1998: 59). We typologists are a community of practice (to the extent we collaborate at least), so we have our own reifications, many of which are not shared with other communities of practices. Reifications are highly suggestive as "Reification provides a shortcut to communication" (Wenger 1998: 58). What makes them dangerous is that they can be useful for our communication even if they lack appropriate referents in the real world. As a bon mot about philosophy circulating in Berlin in the 20ies or 30ies said: philosophy (you can replace it by "linguistics") is the abuse of a terminology that has been invented solely for this purpose. There is good reason to constantly mistrust reifications. A constant trouble with them is that none of them can ever be taken for granted (there is no firm ground). Things are assumed to be time-stable, but because reifications are not really things, they can never be assumed to be generally valid. Wenger says "reification must be reappropriated into a local process in order to become meaningful" (Wenger 1998: 60). As we look at new data from new perspectives we constantly have to update reifications for the particular purpose we are working with.
Let us take the instance of pronominalized proper names we discussed yesterday. After having looked at this topic yesterday in a particular data source, 10 specific places in Bible translations, I am now inclined to define it as follows. (re-labelled as "(person) indexation on proper names"). In utterances such as "I, Tertius, the one who wrote this letter, greet you" (which seem to be typical of written language, especially letters and inscriptions) an indexation on a proper names occurs if there is a marker for (first) person (singular) different from a free personal pronoun or a verbal index that somehow can be considered to form a word or constituent together with the proper name (but note that neither word or constituent [Croft 2001] are fully reliable terms) (it may be a verbal person affix or a possessive affix or anything else expressing first person singular) or in addition to a free personal pronoun (thus structures of the kind "I[,] Tertius I" are also included). Having worked with that material I am now inclined to believe that it is characteristic that the proper name is the apposition of the personal pronoun rather than the other way round and that cases where there is no (emphatic) personal pronoun have to be considered to be secondary. With this definition I can work with that particular material for the time being viewing indexes on proper names in terms of occurrence of a particular event type in a pseudo-experimental setting with language as a variable. At the same time I am aware of the fact that David Gil will probably not be happy with my definition, because he has collected a different kind of material (which is at least as interesting and relevant for the subject as the material considered by myself, but which I just now do not know how to include in my way of reasoning). So I have a cross-linguistically valid cross-linguistic comparative concept, but it is not at all sure that it is applicable to (and useful for) all possible utterances where indexes on proper names occur. In order to explore that type of phenomena further, it is probably a good idea to have several people work with different material and with slightly different cross-linguistic comparative concepts. If we have to stick to one comparative concept once and for all time, we will not be able to adapt them to our "local processes" (i.e., to working with different kinds of data) and so there is a risk that the reification will stop being meaningful.

Altmann, Gabriel and Werner Lehfeldt. 1973. Allgemeine Sprachtypologie. Prinzipien und Meßverfahren. München: Fink.
Croft, William. 2001. Radical Construction Grammar: Syntactic theory in typological perspective. Oxford: Oxford University Press.
Wenger, Etienne. 1998. Communities of Practice. Learning, Meaning, and Identity. Cambridge: Cambridge University Press.

Bernhard Wälchli

