[Lingtyp] spectrograms in linguistic description and for language comparison

Sun Dec 11 05:07:19 UTC 2022

Dear all -- I agree with Adam and Neil. I’m attaching rough proofs of my chapter in the upcoming Handbook of Cognitive Semantics, which is in production with Brill. The chapter surveys sources of data for *empirical* research in linguistics (with special emphasis on semantic research; but I argue that the sources of evidence we have at our disposal are fundamentally the same across languages). It discusses what we can and cannot get out of corpora and spontaneous observation, attempts a typology of elicitation techniques, and proposes best practices for their implementation.

(The text is also largely identical with Ch5 of my book Semantic Research, which is under contract with CUP and hopefully will see the light of day in 2023 or 2024 at the latest. That book, on which I’ve been laboring for a decade (much of it in collaboration with David Wilkins), is a stab at a textbook-cum-handbook for semantic research as an empirical science.)

Best -- Juergen

Juergen Bohnemeyer (He/Him)
Professor, Department of Linguistics
University at Buffalo

Office: 642 Baldy Hall, UB North Campus
Mailing address: 609 Baldy Hall, Buffalo, NY 14260
Phone: (716) 645 0127
Fax: (716) 645 3825
Email: jb77 at buffalo.edu<mailto:jb77 at buffalo.edu>
Web: http://www.acsu.buffalo.edu/~jb77/

Office hours Tu/Th 3:30-4:30pm in 642 Baldy or via Zoom (Meeting ID 585 520 2411; Passcode Hoorheh)

There’s A Crack In Everything - That’s How The Light Gets In
(Leonard Cohen)
--

From: Lingtyp <lingtyp-bounces at listserv.linguistlist.org> on behalf of Neil Myler <myler at bu.edu>
Date: Saturday, December 10, 2022 at 10:21 PM
To: LINGTYP at listserv.linguistlist.org <LINGTYP at listserv.linguistlist.org>
Subject: Re: [Lingtyp] spectrograms in linguistic description and for language comparison
I agree with everything here, with one addendum: it's a strawman even if you do ignore more formal judgment experiments.  The examples are invented, but each data point is a *pairing* of an example and a judgment. Since the judgments aren't invented (except in cases of misconduct), it's wrong to say that the data are.
Neil
On Sat, Dec 10, 2022, 10:05 PM Adam Singerman <adamsingerman at gmail.com<mailto:adamsingerman at gmail.com>> wrote:
I think Randy is wrong (sorry if this comes across as blunt) and so I
am writing, on a Saturday night no less, to voice a different view.

Working inductively from a corpus is great, but no corpus is ever
going to be large enough to fully represent a given language's
grammatical possibilities. If we limit ourselves to working
inductively from corpora then many basic questions about the languages
we research will go unanswered. From a corpus of natural data we
simply cannot know whether a given pattern is missing because the
corpus is finite (i.e., it's just a statistical accident that the
pattern isn't attested) or whether there's a genuine reason why the
pattern is not showing up (i.e., its non-attestation is principled).

When I am writing up my research on Tuparí I always prioritize
non-elicited data (texts, in-person conversation, WhatsApp chats). But
interpreting and analyzing the non-elicited data requires making
reference to acceptability judgments. The prefix (e)tareman- is a
negative polarity item, and it always co-occurs with (and inside the
scope of) a negator morpheme. But the only way I can make this point
is by showing that speakers invariably reject tokens of (e)tareman-
without a licensing negator. Those rejected examples are by definition
not going to be present in any corpus of naturalistic speech, but they
tell me something crucial about what the structure of Tuparí does and
does not allow. If I limit myself to inductively working from a
corpus, fundamental facts about the prefix (e)tareman- and about
negation in Tuparí more broadly will be missed.

A lot of recent scholarship has made major strides towards improving
the methodology of collecting and interpreting acceptability
judgments. The formal semanticists who work on understudied languages
(here I am thinking of Judith Tonhauser, Lisa Matthewson, Ryan
Bochnak, Amy Rose Deal, Scott AnderBois) are extremely careful about
teasing apart utterances that are rejected because of some
morphosyntactic ill-formedness (i.e., ungrammaticality) versus ones
that are rejected because of semantic or pragmatic oddity. The
important point is that such teasing apart can be done, and the
descriptions and analyses that result from this work are richer than
what would result from a methodology that uses corpus examination or
elicitation only.

One more example from Tuparí: this language has an obligatory
witnessed/non-witnessed evidential distinction, but the deictic
orientation of the distinction (to the speaker or to the addressee) is
determined via clause type. There is a nuanced set of interactions
between the evidential morphology and the clause-typing morphology,
and it would have been impossible for me to figure out the basics of
those interactions without relying primarily on conversational data
and discourse context. But I still needed to get some acceptability
judgments to ensure that the picture I'd arrived at wasn't overly
biased by the limitations of my corpus. Finding speakers who were
willing to work with me on those judgments wasn't always easy; a fair
amount of metalinguistic awareness was needed. But it was worth it!
The generalizations that I was able to publish were much more solid
than if I had worked exclusively from corpus data. And the methodology
I learned from the Tonhauser/Matthewson/etc crowd was fundamental to
this work.

The call to work inductively from corpora would have the practical
effect of making certain topics totally inaccessible for research
(control vs raising structures, pied-piping, islands, gaps in
inflectional paradigms, etc) even though large scale acceptability
tasks have shown that these phenomena are "real," i.e., they're not
just in the minds of linguists who are using introspection. Randy's
point that "no other science allows the scientist to make up his or
her own data, and so this is something linguists should give up" is a
straw man argument now that many experimentalist syntacticians use
large-scale acceptability judgments on platforms like Mechanical Turk
to get at speakers' judgments. I think we do a disservice to our
students and to junior scholars if we tell them that the only real
stuff to be studied will be in the corpora that we assemble. Even the
best corpora are finite, whereas L1 speakers' knowledge of their
language is infinitely productive.

— Adam
_______________________________________________
Lingtyp mailing list
Lingtyp at listserv.linguistlist.org<mailto:Lingtyp at listserv.linguistlist.org>
https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flistserv.linguistlist.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Flingtyp&data=05%7C01%7Cjb77%40buffalo.edu%7C243aea832f624794e4da08dadb26ac49%7C96464a8af8ed40b199e25f6b50a20250%7C0%7C0%7C638063256872055833%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=RktZzmPhH9YRxywpcxrcF8eeMu3t0v5J%2FxNjCPJcx6s%3D&reserved=0>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20221211/a5d9739c/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Bohnemeyer_in_press_HCS.pdf
Type: application/pdf
Size: 3682772 bytes
Desc: Bohnemeyer_in_press_HCS.pdf
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20221211/a5d9739c/attachment-0001.pdf>