[Lingtyp] spectrograms in linguistic description and for language comparison

Thu Dec 15 17:26:51 UTC 2022

Hi Guillaume,

> A pragmatic approach seems to me preferable to an ideological one on these matters.

The approach I am advocating is not ideological in the sense you seem to imply, but practical advice based on my own experience doing fieldwork. Yes, of course if you produce an utterance in the language and the native speakers correct you, or say ‘you can’t say that’, that can be good evidence, but you need to find out what the factor was that made them not accept it. I had the pleasure of working with what I consider the ideal consultant when I started working on the Qugu dialect of Qiang (not the dialect I wrote the grammar on) back in the early 1990’s. He was illiterate in Chinese and Qiang, hadn’t left the mountains for 30 years, and could not conceive of language as some abstract object of study: for him language was about meaning, and his cognitive categories did not al all match the standard wordlist I was using at the time, so I could not ask him how to say ‘cloud’ or ‘pheasant’—it had to be real, i.e. a particular kind of cloud or a particular kind of pheasant.  I would sometimes try to say things in the language and see how he reacted, and he often would say “you can’t say that”, and I would note it down, but once I even used a simple sentence like 'Khutʂi went out to his field’ and he said ‘you can’t say that’. That puzzled me, so I pushed a bit on that one, and he finally said, "Khutʂi doesn’t have a field.” I thien had to go back and check all the other negative evidence and found much of it was of this type, as he was teaching me the traditional Qiang history stories and wanted to make sure I had them right, and would say ‘you can’t say that’ if it was not part of the story.

I also found that if I used a questionnaire type approach, asking individual sentences, there would be two problematic outcomes: either the person or I would misunderstand what the other was saying, or, more commonly, you would get back a form that closely matched the working language. It became clear to me I wasn’t getting at the real language, and I have heard other linguists talk about this problem as well. I ended up trowing away all of that elicited data and then worked exclusively on natural texts of different genre, and found all sorts of patterns and morphology that did not show up in the elicited data. I did the same with the Rawang language of northern Burma. So this is something I came by through experience.

All the best,
Randy
——
Professor Randy J. LaPolla（罗仁地), PhD FAHA 
Center for Language Sciences
Institute for Advanced Studies in Humanities and Social Sciences
Beijing Normal University at Zhuhai
A302, Muduo Building, #18 Jinfeng Road, Zhuhai City, Guangdong, China

https://randylapolla.info <https://randylapolla.info/>
ORCID ID: https://orcid.org/0000-0002-6100-6196 <https://orcid.org/0000-0002-6100-6196>    

邮编：519000
广东省珠海市唐家湾镇金凤路18号木铎楼A302
北京师范大学珠海校区
人文和社会科学高等研究院
语言科学研究中心 

> On 15 Dec 2022, at 2:24 AM, Guillaume Jacques <rgyalrongskad at gmail.com> wrote:
> 
> Dear Randy, Adam and colleagues,
> 
> When doing research on unwritten languages, while I support the notion of working primarily from a corpus (which is what I tried to achieve in my Japhug grammar <https://langsci-press.org/catalog/book/295>), I also think that negative data is crucial, be it from elicitation, or from corrections by native speakers on the linguist's speech when s/he speaks the language. It should be meticulously documented.
> 
> With regards to elicitation, syntax and morphology are quite different. For inflectional morphology a linguist can reasonably be expected to produce an exhaustive inventory of all possible forms, and for that elicitation and research on potential gaps is necessary, as nobody can manually collect a corpus large enough to attest all forms, and negative information will be even more difficult. In addition, grammars that naively collect paradigms without trying to produce the forms and test all possibilities with speakers are sometimes not only incomplete, but can also include inconsistent transcriptions. In my study of the Khaling verb <http://crlao.ehess.fr/docannexe/file/1739/khaling_verb.pdf> I collected data from speakers and wrote a script to generate the paradigms at the same time, which allowed for a systematic and thorough verification of the data. 
> 
> A pragmatic approach seems to me preferable to an ideological one on these matters.
> 
> Guillaume
> 
> 
> 
> 
> Le mer. 14 déc. 2022 à 18:47, Randy J. LaPolla <randy.lapolla at gmail.com <mailto:randy.lapolla at gmail.com>> a écrit :
> PS: One thing I forgot to mention about induction:
> If you know about the history of AI, in the 70’s and 80’s the main paradigm was symbolic AI, which is rule-based. They worked very hard for many years to get the systems to parse even simple sentences, but when it was clear to all that the rule-based approach was a failure, they experimented with a purely inductive approach, and the difference in output convinced them right away this was the way to go. After their experiments, Jeff Dean, the head of Google’s Brain Lab, famously said, “We don’t need grammar”. Google translate is as good as it is now because of this switch to an inductive method. Of course those of us doing fieldwork will not have the large database Google has or the speed of the machines, but the principle is the same: induction can get you there.
> 
> Randy
> ——
> Professor Randy J. LaPolla（罗仁地), PhD FAHA 
> Center for Language Sciences
> Institute for Advanced Studies in Humanities and Social Sciences
> Beijing Normal University at Zhuhai
> A302, Muduo Building, #18 Jinfeng Road, Zhuhai City, Guangdong, China
> 
> https://randylapolla.info <https://randylapolla.info/>
> ORCID ID: https://orcid.org/0000-0002-6100-6196 <https://orcid.org/0000-0002-6100-6196>    
> 
> 邮编：519000
> 广东省珠海市唐家湾镇金凤路18号木铎楼A302
> 北京师范大学珠海校区
> 人文和社会科学高等研究院
> 语言科学研究中心 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>> On 14 Dec 2022, at 11:15 PM, Randy J. LaPolla <randy.lapolla at gmail.com <mailto:randy.lapolla at gmail.com>> wrote:
>> 
>> Dear Adam,
>> Sorry to just be getting back to you on this.
>> 
>> We have very different conceptions of language and goals in doing linguistics (your interest in "(control vs raising structures, pied-piping, islands, gaps in inflectional paradigms, etc)” very much reflects this, as these are not things that concern me). This is what my blog post is about, how different the choices can be. Saying I am “wrong” implies there is only one right way to do linguistics.  Not to be rude, but as Popper said, "Whenever a theory appears to you as the only possible one, take this as a sign that you have neither understood the theory nor the problem which it was intended to solve." -Karl Popper, Objective Knowledge: An Evolutionary Approach (Clarendon Press, 1972, p. 266)
>> All of our theories and methodologies are subjective attempts to achieve some goal, actually heuristics. As there are different ways of looking at the same phenomenon and for different purposes, there is no right and wrong, only more or less useful ways of analysing the phenomena relative to some purpose, and also depend on our assumptions, definition of language, etc., as I discuss in the blog. Y. R. Chao and J. R. Firth both argued that there are different ways to analyse language depending on the language and your purposes, including for example not using a phoneme type of representation of the phonology, but some different type. This is not just linguistics, but for example, physics as well, where light can be understood as a wave or a particle depending on your purposes. And our understanding of the Universe has gone through many major changes, each one thought to be “right” at the time, but later overthrown by a later theory. 
>> 
>> You assume that we can somehow "fully represent a given language’s grammatical possibilities”. As language is a complex adaptive system that is constantly changing, and is human behaviour and so not a finite thing, I don’t think that it will ever be possible to "fully represent a given language’s grammatical possibilities”. One problem I see with modern linguistics is not acknowledging the tremendous diversity of usages within a single language, as the search has been for universals and for a single tight system, which even Charles Hocket (1967) said was a “wild goose chase”. That is just for one language, never mind trying to do that for all languages, which has led to linguistics missing so much of the diversity between languages.
>> 
>> The beauty of working inductively is that you are only responsible for what is in your data. You don’t have to make broad generalisations about the language that in many cases turn out to be problematic, you just say this is what is and is not in my data. Of course the more data you have the stronger the generalisations you can make. I did not rule out using some stimuli such as the MPI sets, as these set up contexts that the speaker can talk about, but asking people to translate word lists or sentences will not give you useful data. What you will get back are the categories of the working language. But again, this is part of the problem. Too many linguists think that words are translatable and mean the same thing in different languages. This is easily shown to be false. Not only is the prototype of the cognitive category represented by the word different for different cultures (even different speakers), but the extension (the use of the word for different objects or situations) is also different. This is true of every word in the languages. Humboldt knew this, and argued against Aristotle’s view that all people have the same object in mind even if the word is different. Humboldt said no, even if we both look at the same horse we are seeing different things, as our cognitive categories are different.
>> 
>> All the best,
>> Randy
>> 
>> ——
>> Professor Randy J. LaPolla（罗仁地), PhD FAHA 
>> Center for Language Sciences
>> Institute for Advanced Studies in Humanities and Social Sciences
>> Beijing Normal University at Zhuhai
>> A302, Muduo Building, #18 Jinfeng Road, Zhuhai City, Guangdong, China
>> 
>> https://randylapolla.info <https://randylapolla.info/>
>> ORCID ID: https://orcid.org/0000-0002-6100-6196 <https://orcid.org/0000-0002-6100-6196>    
>> 
>> 邮编：519000
>> 广东省珠海市唐家湾镇金凤路18号木铎楼A302
>> 北京师范大学珠海校区
>> 人文和社会科学高等研究院
>> 语言科学研究中心 
>> 
>> 
>>> On 11 Dec 2022, at 11:00 AM, Adam Singerman <adamsingerman at gmail.com <mailto:adamsingerman at gmail.com>> wrote:
>>> 
>>> I think Randy is wrong (sorry if this comes across as blunt) and so I
>>> am writing, on a Saturday night no less, to voice a different view.
>>> 
>>> Working inductively from a corpus is great, but no corpus is ever
>>> going to be large enough to fully represent a given language's
>>> grammatical possibilities. If we limit ourselves to working
>>> inductively from corpora then many basic questions about the languages
>>> we research will go unanswered. From a corpus of natural data we
>>> simply cannot know whether a given pattern is missing because the
>>> corpus is finite (i.e., it's just a statistical accident that the
>>> pattern isn't attested) or whether there's a genuine reason why the
>>> pattern is not showing up (i.e., its non-attestation is principled).
>>> 
>>> When I am writing up my research on Tuparí I always prioritize
>>> non-elicited data (texts, in-person conversation, WhatsApp chats). But
>>> interpreting and analyzing the non-elicited data requires making
>>> reference to acceptability judgments. The prefix (e)tareman- is a
>>> negative polarity item, and it always co-occurs with (and inside the
>>> scope of) a negator morpheme. But the only way I can make this point
>>> is by showing that speakers invariably reject tokens of (e)tareman-
>>> without a licensing negator. Those rejected examples are by definition
>>> not going to be present in any corpus of naturalistic speech, but they
>>> tell me something crucial about what the structure of Tuparí does and
>>> does not allow. If I limit myself to inductively working from a
>>> corpus, fundamental facts about the prefix (e)tareman- and about
>>> negation in Tuparí more broadly will be missed.
>>> 
>>> A lot of recent scholarship has made major strides towards improving
>>> the methodology of collecting and interpreting acceptability
>>> judgments. The formal semanticists who work on understudied languages
>>> (here I am thinking of Judith Tonhauser, Lisa Matthewson, Ryan
>>> Bochnak, Amy Rose Deal, Scott AnderBois) are extremely careful about
>>> teasing apart utterances that are rejected because of some
>>> morphosyntactic ill-formedness (i.e., ungrammaticality) versus ones
>>> that are rejected because of semantic or pragmatic oddity. The
>>> important point is that such teasing apart can be done, and the
>>> descriptions and analyses that result from this work are richer than
>>> what would result from a methodology that uses corpus examination or
>>> elicitation only.
>>> 
>>> One more example from Tuparí: this language has an obligatory
>>> witnessed/non-witnessed evidential distinction, but the deictic
>>> orientation of the distinction (to the speaker or to the addressee) is
>>> determined via clause type. There is a nuanced set of interactions
>>> between the evidential morphology and the clause-typing morphology,
>>> and it would have been impossible for me to figure out the basics of
>>> those interactions without relying primarily on conversational data
>>> and discourse context. But I still needed to get some acceptability
>>> judgments to ensure that the picture I'd arrived at wasn't overly
>>> biased by the limitations of my corpus. Finding speakers who were
>>> willing to work with me on those judgments wasn't always easy; a fair
>>> amount of metalinguistic awareness was needed. But it was worth it!
>>> The generalizations that I was able to publish were much more solid
>>> than if I had worked exclusively from corpus data. And the methodology
>>> I learned from the Tonhauser/Matthewson/etc crowd was fundamental to
>>> this work.
>>> 
>>> The call to work inductively from corpora would have the practical
>>> effect of making certain topics totally inaccessible for research
>>> (control vs raising structures, pied-piping, islands, gaps in
>>> inflectional paradigms, etc) even though large scale acceptability
>>> tasks have shown that these phenomena are "real," i.e., they're not
>>> just in the minds of linguists who are using introspection. Randy's
>>> point that "no other science allows the scientist to make up his or
>>> her own data, and so this is something linguists should give up" is a
>>> straw man argument now that many experimentalist syntacticians use
>>> large-scale acceptability judgments on platforms like Mechanical Turk
>>> to get at speakers' judgments. I think we do a disservice to our
>>> students and to junior scholars if we tell them that the only real
>>> stuff to be studied will be in the corpora that we assemble. Even the
>>> best corpora are finite, whereas L1 speakers' knowledge of their
>>> language is infinitely productive.
>>> 
>>> — Adam
>>> _______________________________________________
>>> Lingtyp mailing list
>>> Lingtyp at listserv.linguistlist.org <mailto:Lingtyp at listserv.linguistlist.org>
>>> https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp <https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp>
>> 
> 
> _______________________________________________
> Lingtyp mailing list
> Lingtyp at listserv.linguistlist.org <mailto:Lingtyp at listserv.linguistlist.org>
> https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp <https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp>
> 
> 
> -- 
> Guillaume Jacques
> 
> Directeur de recherches
> CNRS (CRLAO) - EPHE- INALCO 
> https://scholar.google.fr/citations?user=1XCp2-oAAAAJ&hl=fr <https://scholar.google.fr/citations?user=1XCp2-oAAAAJ&hl=fr>
> https://langsci-press.org/catalog/book/295 <http://cnrs.academia.edu/GuillaumeJacques>
> http://panchr.hypotheses.org/ <http://panchr.hypotheses.org/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20221216/3125ac05/attachment.htm>