[Lingtyp] spectrograms in linguistic description and for language comparison

Guillaume Jacques rgyalrongskad at gmail.com
Wed Dec 14 18:24:53 UTC 2022


Dear Randy, Adam and colleagues,

When doing research on unwritten languages, while I support the notion of
working primarily from a corpus (which is what I tried to achieve in my Japhug
grammar <https://langsci-press.org/catalog/book/295>), I also think that
negative data is crucial, be it from elicitation, or from corrections by
native speakers on the linguist's speech when s/he speaks the language. It
should be meticulously documented.

With regards to elicitation, syntax and morphology are quite different. For
inflectional morphology a linguist can reasonably be expected to produce an
exhaustive inventory of all possible forms, and for that elicitation and
research on potential gaps is necessary, as nobody can manually collect a
corpus large enough to attest all forms, and negative information will be
even more difficult. In addition, grammars that naively collect paradigms
without trying to produce the forms and test all possibilities with
speakers are sometimes not only incomplete, but can also include
inconsistent transcriptions. In my study of the Khaling verb
<http://crlao.ehess.fr/docannexe/file/1739/khaling_verb.pdf> I collected
data from speakers and wrote a script to generate the paradigms at the same
time, which allowed for a systematic and thorough verification of the data.

A pragmatic approach seems to me preferable to an ideological one on these
matters.

Guillaume




Le mer. 14 déc. 2022 à 18:47, Randy J. LaPolla <randy.lapolla at gmail.com> a
écrit :

> PS: One thing I forgot to mention about induction:
> If you know about the history of AI, in the 70’s and 80’s the main
> paradigm was symbolic AI, which is rule-based. They worked very hard for
> many years to get the systems to parse even simple sentences, but when it
> was clear to all that the rule-based approach was a failure, they
> experimented with a purely inductive approach, and the difference in output
> convinced them right away this was the way to go. After their experiments,
> Jeff Dean, the head of Google’s Brain Lab, famously said, “We don’t need
> grammar”. Google translate is as good as it is now because of this switch
> to an inductive method. Of course those of us doing fieldwork will not have
> the large database Google has or the speed of the machines, but the
> principle is the same: induction can get you there.
>
> Randy
> ——
> Professor Randy J. LaPolla(罗仁地), PhD FAHA
> Center for Language Sciences
> Institute for Advanced Studies in Humanities and Social Sciences
> Beijing Normal University at Zhuhai
> A302, Muduo Building, #18 Jinfeng Road, Zhuhai City, Guangdong, China
>
> https://randylapolla.info
> ORCID ID: https://orcid.org/0000-0002-6100-6196
>
> 邮编:519000
> 广东省珠海市唐家湾镇金凤路18号木铎楼A302
> 北京师范大学珠海校区
> 人文和社会科学高等研究院
> 语言科学研究中心
>
>
>
>
>
>
>
>
>
>
> On 14 Dec 2022, at 11:15 PM, Randy J. LaPolla <randy.lapolla at gmail.com>
> wrote:
>
> Dear Adam,
> Sorry to just be getting back to you on this.
>
> We have very different conceptions of language and goals in doing
> linguistics (your interest in "(control vs raising structures, pied-piping,
> islands, gaps in inflectional paradigms, etc)” very much reflects this, as
> these are not things that concern me). This is what my blog post is about,
> how different the choices can be. Saying I am “wrong” implies there is only
> one right way to do linguistics.  Not to be rude, but as Popper said,
> "Whenever a theory appears to you as the only possible one, take this as a
> sign that you have neither understood the theory nor the problem which it
> was intended to solve." -Karl Popper, *Objective Knowledge: An
> Evolutionary Approach* (Clarendon Press, 1972, p. 266)
> All of our theories and methodologies are subjective attempts to achieve
> some goal, actually heuristics. As there are different ways of looking at
> the same phenomenon and for different purposes, there is no right and
> wrong, only more or less useful ways of analysing the phenomena relative to
> some purpose, and also depend on our assumptions, definition of language,
> etc., as I discuss in the blog. Y. R. Chao and J. R. Firth both argued that
> there are different ways to analyse language depending on the language and
> your purposes, including for example not using a phoneme type of
> representation of the phonology, but some different type. This is not just
> linguistics, but for example, physics as well, where light can be
> understood as a wave or a particle depending on your purposes. And our
> understanding of the Universe has gone through many major changes, each one
> thought to be “right” at the time, but later overthrown by a later theory.
>
> You assume that we can somehow "fully represent a given language’s
> grammatical possibilities”. As language is a complex adaptive system that
> is constantly changing, and is human behaviour and so not a finite thing, I
> don’t think that it will ever be possible to "fully represent a given
> language’s grammatical possibilities”. One problem I see with modern
> linguistics is not acknowledging the tremendous diversity of usages within
> a single language, as the search has been for universals and for a single
> tight system, which even Charles Hocket (1967) said was a “wild goose
> chase”. That is just for one language, never mind trying to do that for all
> languages, which has led to linguistics missing so much of the diversity
> between languages.
>
> The beauty of working inductively is that you are only responsible for
> what is in your data. You don’t have to make broad generalisations about
> the language that in many cases turn out to be problematic, you just say
> this is what is and is not in my data. Of course the more data you have the
> stronger the generalisations you can make. I did not rule out using some
> stimuli such as the MPI sets, as these set up contexts that the speaker can
> talk about, but asking people to translate word lists or sentences will not
> give you useful data. What you will get back are the categories of the
> working language. But again, this is part of the problem. Too many
> linguists think that words are translatable and mean the same thing in
> different languages. This is easily shown to be false. Not only is the
> prototype of the cognitive category represented by the word different for
> different cultures (even different speakers), but the extension (the use of
> the word for different objects or situations) is also different. This is
> true of every word in the languages. Humboldt knew this, and argued against
> Aristotle’s view that all people have the same object in mind even if the
> word is different. Humboldt said no, even if we both look at the same horse
> we are seeing different things, as our cognitive categories are different.
>
> All the best,
> Randy
>
> ——
> Professor Randy J. LaPolla(罗仁地), PhD FAHA
> Center for Language Sciences
> Institute for Advanced Studies in Humanities and Social Sciences
> Beijing Normal University at Zhuhai
> A302, Muduo Building, #18 Jinfeng Road, Zhuhai City, Guangdong, China
>
> https://randylapolla.info
> ORCID ID: https://orcid.org/0000-0002-6100-6196
>
> 邮编:519000
> 广东省珠海市唐家湾镇金凤路18号木铎楼A302
> 北京师范大学珠海校区
> 人文和社会科学高等研究院
> 语言科学研究中心
>
>
> On 11 Dec 2022, at 11:00 AM, Adam Singerman <adamsingerman at gmail.com>
> wrote:
>
> I think Randy is wrong (sorry if this comes across as blunt) and so I
> am writing, on a Saturday night no less, to voice a different view.
>
> Working inductively from a corpus is great, but no corpus is ever
> going to be large enough to fully represent a given language's
> grammatical possibilities. If we limit ourselves to working
> inductively from corpora then many basic questions about the languages
> we research will go unanswered. From a corpus of natural data we
> simply cannot know whether a given pattern is missing because the
> corpus is finite (i.e., it's just a statistical accident that the
> pattern isn't attested) or whether there's a genuine reason why the
> pattern is not showing up (i.e., its non-attestation is principled).
>
> When I am writing up my research on Tuparí I always prioritize
> non-elicited data (texts, in-person conversation, WhatsApp chats). But
> interpreting and analyzing the non-elicited data requires making
> reference to acceptability judgments. The prefix (e)tareman- is a
> negative polarity item, and it always co-occurs with (and inside the
> scope of) a negator morpheme. But the only way I can make this point
> is by showing that speakers invariably reject tokens of (e)tareman-
> without a licensing negator. Those rejected examples are by definition
> not going to be present in any corpus of naturalistic speech, but they
> tell me something crucial about what the structure of Tuparí does and
> does not allow. If I limit myself to inductively working from a
> corpus, fundamental facts about the prefix (e)tareman- and about
> negation in Tuparí more broadly will be missed.
>
> A lot of recent scholarship has made major strides towards improving
> the methodology of collecting and interpreting acceptability
> judgments. The formal semanticists who work on understudied languages
> (here I am thinking of Judith Tonhauser, Lisa Matthewson, Ryan
> Bochnak, Amy Rose Deal, Scott AnderBois) are extremely careful about
> teasing apart utterances that are rejected because of some
> morphosyntactic ill-formedness (i.e., ungrammaticality) versus ones
> that are rejected because of semantic or pragmatic oddity. The
> important point is that such teasing apart can be done, and the
> descriptions and analyses that result from this work are richer than
> what would result from a methodology that uses corpus examination or
> elicitation only.
>
> One more example from Tuparí: this language has an obligatory
> witnessed/non-witnessed evidential distinction, but the deictic
> orientation of the distinction (to the speaker or to the addressee) is
> determined via clause type. There is a nuanced set of interactions
> between the evidential morphology and the clause-typing morphology,
> and it would have been impossible for me to figure out the basics of
> those interactions without relying primarily on conversational data
> and discourse context. But I still needed to get some acceptability
> judgments to ensure that the picture I'd arrived at wasn't overly
> biased by the limitations of my corpus. Finding speakers who were
> willing to work with me on those judgments wasn't always easy; a fair
> amount of metalinguistic awareness was needed. But it was worth it!
> The generalizations that I was able to publish were much more solid
> than if I had worked exclusively from corpus data. And the methodology
> I learned from the Tonhauser/Matthewson/etc crowd was fundamental to
> this work.
>
> The call to work inductively from corpora would have the practical
> effect of making certain topics totally inaccessible for research
> (control vs raising structures, pied-piping, islands, gaps in
> inflectional paradigms, etc) even though large scale acceptability
> tasks have shown that these phenomena are "real," i.e., they're not
> just in the minds of linguists who are using introspection. Randy's
> point that "no other science allows the scientist to make up his or
> her own data, and so this is something linguists should give up" is a
> straw man argument now that many experimentalist syntacticians use
> large-scale acceptability judgments on platforms like Mechanical Turk
> to get at speakers' judgments. I think we do a disservice to our
> students and to junior scholars if we tell them that the only real
> stuff to be studied will be in the corpora that we assemble. Even the
> best corpora are finite, whereas L1 speakers' knowledge of their
> language is infinitely productive.
>
> — Adam
> _______________________________________________
> Lingtyp mailing list
> Lingtyp at listserv.linguistlist.org
> https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp
>
>
>
> _______________________________________________
> Lingtyp mailing list
> Lingtyp at listserv.linguistlist.org
> https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp
>


-- 
Guillaume Jacques

Directeur de recherches
CNRS (CRLAO) - EPHE- INALCO
https://scholar.google.fr/citations?user=1XCp2-oAAAAJ&hl=fr
https://langsci-press.org/catalog/book/295
<http://cnrs.academia.edu/GuillaumeJacques>
http://panchr.hypotheses.org/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20221214/9c43941c/attachment.htm>


More information about the Lingtyp mailing list