[Lingtyp] spectrograms in linguistic description and for language comparison

Spike Gildea spike at uoregon.edu
Wed Dec 14 16:14:04 UTC 2022

It is interesting reading these different points of view while working on a corpus-based study of NPs in Cariban. From my theoretical perspective, it is unquestionable that the judgments of native speakers about sentences produced in elicitation do not have the same status as

From: Lingtyp <lingtyp-bounces at listserv.linguistlist.org> on behalf of Randy J. LaPolla <randy.lapolla at gmail.com>
Date: Wednesday, December 14, 2022 at 7:31 AM
To: Adam Singerman <adamsingerman at gmail.com>
Cc: lingtyp at listserv.linguistlist.org <lingtyp at listserv.linguistlist.org>
Subject: Re: [Lingtyp] spectrograms in linguistic description and for language comparison
Dear Adam,
Sorry to just be getting back to you on this.

We have very different conceptions of language and goals in doing linguistics (your interest in "(control vs raising structures, pied-piping, islands, gaps in inflectional paradigms, etc)” very much reflects this, as these are not things that concern me). This is what my blog post is about, how different the choices can be. Saying I am “wrong” implies there is only one right way to do linguistics.  Not to be rude, but as Popper said, "Whenever a theory appears to you as the only possible one, take this as a sign that you have neither understood the theory nor the problem which it was intended to solve." -Karl Popper, Objective Knowledge: An Evolutionary Approach (Clarendon Press, 1972, p. 266)
All of our theories and methodologies are subjective attempts to achieve some goal, actually heuristics. As there are different ways of looking at the same phenomenon and for different purposes, there is no right and wrong, only more or less useful ways of analysing the phenomena relative to some purpose, and also depend on our assumptions, definition of language, etc., as I discuss in the blog. Y. R. Chao and J. R. Firth both argued that there are different ways to analyse language depending on the language and your purposes, including for example not using a phoneme type of representation of the phonology, but some different type. This is not just linguistics, but for example, physics as well, where light can be understood as a wave or a particle depending on your purposes. And our understanding of the Universe has gone through many major changes, each one thought to be “right” at the time, but later overthrown by a later theory.

You assume that we can somehow "fully represent a given language’s grammatical possibilities”. As language is a complex adaptive system that is constantly changing, and is human behaviour and so not a finite thing, I don’t think that it will ever be possible to "fully represent a given language’s grammatical possibilities”. One problem I see with modern linguistics is not acknowledging the tremendous diversity of usages within a single language, as the search has been for universals and for a single tight system, which even Charles Hocket (1967) said was a “wild goose chase”. That is just for one language, never mind trying to do that for all languages, which has led to linguistics missing so much of the diversity between languages.

The beauty of working inductively is that you are only responsible for what is in your data. You don’t have to make broad generalisations about the language that in many cases turn out to be problematic, you just say this is what is and is not in my data. Of course the more data you have the stronger the generalisations you can make. I did not rule out using some stimuli such as the MPI sets, as these set up contexts that the speaker can talk about, but asking people to translate word lists or sentences will not give you useful data. What you will get back are the categories of the working language. But again, this is part of the problem. Too many linguists think that words are translatable and mean the same thing in different languages. This is easily shown to be false. Not only is the prototype of the cognitive category represented by the word different for different cultures (even different speakers), but the extension (the use of the word for different objects or situations) is also different. This is true of every word in the languages. Humboldt knew this, and argued against Aristotle’s view that all people have the same object in mind even if the word is different. Humboldt said no, even if we both look at the same horse we are seeing different things, as our cognitive categories are different.

All the best,

Professor Randy J. LaPolla(罗仁地), PhD FAHA
Center for Language Sciences
Institute for Advanced Studies in Humanities and Social Sciences
Beijing Normal University at Zhuhai
A302, Muduo Building, #18 Jinfeng Road, Zhuhai City, Guangdong, China

ORCID ID: https://orcid.org/0000-0002-6100-6196<https://urldefense.com/v3/__https:/orcid.org/0000-0002-6100-6196__;!!C5qS4YX3!DTgN_8bahcGghVob2agvDOw8T4dg9sdCxGbgf1HVXnpIWfR31bJ2C4rJICzptu5b23-wM4qgeUMpm8qtJBrrzM4$>


On 11 Dec 2022, at 11:00 AM, Adam Singerman <adamsingerman at gmail.com<mailto:adamsingerman at gmail.com>> wrote:

I think Randy is wrong (sorry if this comes across as blunt) and so I
am writing, on a Saturday night no less, to voice a different view.

Working inductively from a corpus is great, but no corpus is ever
going to be large enough to fully represent a given language's
grammatical possibilities. If we limit ourselves to working
inductively from corpora then many basic questions about the languages
we research will go unanswered. From a corpus of natural data we
simply cannot know whether a given pattern is missing because the
corpus is finite (i.e., it's just a statistical accident that the
pattern isn't attested) or whether there's a genuine reason why the
pattern is not showing up (i.e., its non-attestation is principled).

When I am writing up my research on Tuparí I always prioritize
non-elicited data (texts, in-person conversation, WhatsApp chats). But
interpreting and analyzing the non-elicited data requires making
reference to acceptability judgments. The prefix (e)tareman- is a
negative polarity item, and it always co-occurs with (and inside the
scope of) a negator morpheme. But the only way I can make this point
is by showing that speakers invariably reject tokens of (e)tareman-
without a licensing negator. Those rejected examples are by definition
not going to be present in any corpus of naturalistic speech, but they
tell me something crucial about what the structure of Tuparí does and
does not allow. If I limit myself to inductively working from a
corpus, fundamental facts about the prefix (e)tareman- and about
negation in Tuparí more broadly will be missed.

A lot of recent scholarship has made major strides towards improving
the methodology of collecting and interpreting acceptability
judgments. The formal semanticists who work on understudied languages
(here I am thinking of Judith Tonhauser, Lisa Matthewson, Ryan
Bochnak, Amy Rose Deal, Scott AnderBois) are extremely careful about
teasing apart utterances that are rejected because of some
morphosyntactic ill-formedness (i.e., ungrammaticality) versus ones
that are rejected because of semantic or pragmatic oddity. The
important point is that such teasing apart can be done, and the
descriptions and analyses that result from this work are richer than
what would result from a methodology that uses corpus examination or
elicitation only.

One more example from Tuparí: this language has an obligatory
witnessed/non-witnessed evidential distinction, but the deictic
orientation of the distinction (to the speaker or to the addressee) is
determined via clause type. There is a nuanced set of interactions
between the evidential morphology and the clause-typing morphology,
and it would have been impossible for me to figure out the basics of
those interactions without relying primarily on conversational data
and discourse context. But I still needed to get some acceptability
judgments to ensure that the picture I'd arrived at wasn't overly
biased by the limitations of my corpus. Finding speakers who were
willing to work with me on those judgments wasn't always easy; a fair
amount of metalinguistic awareness was needed. But it was worth it!
The generalizations that I was able to publish were much more solid
than if I had worked exclusively from corpus data. And the methodology
I learned from the Tonhauser/Matthewson/etc crowd was fundamental to
this work.

The call to work inductively from corpora would have the practical
effect of making certain topics totally inaccessible for research
(control vs raising structures, pied-piping, islands, gaps in
inflectional paradigms, etc) even though large scale acceptability
tasks have shown that these phenomena are "real," i.e., they're not
just in the minds of linguists who are using introspection. Randy's
point that "no other science allows the scientist to make up his or
her own data, and so this is something linguists should give up" is a
straw man argument now that many experimentalist syntacticians use
large-scale acceptability judgments on platforms like Mechanical Turk
to get at speakers' judgments. I think we do a disservice to our
students and to junior scholars if we tell them that the only real
stuff to be studied will be in the corpora that we assemble. Even the
best corpora are finite, whereas L1 speakers' knowledge of their
language is infinitely productive.

― Adam
Lingtyp mailing list
Lingtyp at listserv.linguistlist.org<mailto:Lingtyp at listserv.linguistlist.org>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20221214/43c943d9/attachment.htm>

More information about the Lingtyp mailing list