[Lingtyp] spectrograms in linguistic description and for language comparison

Martin Haspelmath martin_haspelmath at eva.mpg.de
Sun Dec 11 07:13:04 UTC 2022

Dear all,

Thanks to everyone for this super-interesting discussion! I think that 
we really all agree that wellformedness judgements are crucial to 
grammatical description, and they have always been – but as we move out 
from highly frequent inflectional pattern (Christian Lehmann's apt 
example was English /*goed/ vs. /went/) to less frequent patterns of 
syntax, the judgements get more difficult. It may be that many of the 
judgement data that we find in the literature are problematic, but the 
key role of wellformedness judgements as such is not in question, I 
think. (This is like elsewhere in cognition: We can "distinguish cats 
from dogs from other critters and things with great confidence", as 
Juergen notes.)

In Randy's recent blogpost (https://dlc.hypotheses.org/2825), he says 
that asking for acceptability judgements is really 
– which I think is exactly right (and providing more context makes the 
task easier)! But crucially, many expressions constructed by the 
linguist do not make sense in *any* context (they are grammatically 
ill-formed), and many speakers recognize this immediately. And I agree 
with Neil Myler that it's not the data that are invented, but the 
"experimental stimuli", which is a routine procedure in other fields 
such as psychology or behavioural economics.

I don't think we should ideologize this very basic aspect of language 
description. Adam Singerman mentioned "the Tonhauser/Matthewson/etc 
crowd", and unfortunately, Davis et al. (2014) wrote a paper saying that 
hypothesis-testing is somehow characteristic of "C-linguists" as opposed 
to "D-linguists", which I think was very odd (I think they confused 
descriptive with comparative hypothesis-testing, as I noted in my 
commentary: https://muse.jhu.edu/article/563097). This odd distinction 
between "C-" and "D-" linguists was originally made by Levinson & Evans 
(2010), and Randy LaPolla, too ideologizes the discussion by putting a 
"structuralist" label on it. I'm not sue that these labels are helpful 
(I think that "we are all structuralists": https://dlc.hypotheses.org/2356).

Maybe there is one difference in interpreting acceptability judgements 
that still needs further discussion, also in usage-based linguistics: Do 
such judgements tell us about the mental grammars of the speakers, or do 
they merely tell us about the social acceptability of linguistic 
expressions? I think that it's the latter (as I said in this blogpost: 
https://dlc.hypotheses.org/2433), whereas many linguists seem to jumpt 
to conclusions about mental representations very quickly.



Davis, Henry & Gillon, Carrie & Matthewson, Lisa. 2014. How to 
investigate linguistic diversity: Lessons from the Pacific Northwest. 
/Language/ 90(4). e180–e226.
Levinson, Stephen C. & Evans, Nicholas. 2010. Time for a sea-change in 
linguistics: Response to comments on ‘The myth of language universals.’ 
/Lingua/ 120(12). 2733–2758.

Am 11.12.22 um 06:07 schrieb Juergen Bohnemeyer:
> Dear all -- I agree with Adam and Neil. I’m attaching rough proofs of 
> my chapter in the upcoming Handbook of Cognitive Semantics, which is 
> in production with Brill. The chapter surveys sources of data for 
> **empirical** research in linguistics (with special emphasis on 
> semantic research; but I argue that the sources of evidence we have at 
> our disposal are fundamentally the same across languages). It 
> discusses what we can and cannot get out of corpora and spontaneous 
> observation, attempts a typology of elicitation techniques, and 
> proposes best practices for their implementation.
> (The text is also largely identical with Ch5 of my book /Semantic 
> Research/, which is under contract with CUP and hopefully will see the 
> light of day in 2023 or 2024 at the latest. That book, on which I’ve 
> been laboring for a decade (much of it in collaboration with David 
> Wilkins), is a stab at a textbook-cum-handbook for semantic research 
> as an empirical science.)
> Best -- Juergen
> Juergen Bohnemeyer (He/Him)
> Professor, Department of Linguistics
> University at Buffalo
> Office: 642 Baldy Hall, UB North Campus
> Mailing address: 609 Baldy Hall, Buffalo, NY 14260
> Phone: (716) 645 0127
> Fax: (716) 645 3825
> Email: jb77 at buffalo.edu <mailto:jb77 at buffalo.edu>
> Web: http://www.acsu.buffalo.edu/~jb77/ 
> <http://www.acsu.buffalo.edu/~jb77/>
> Office hours Tu/Th 3:30-4:30pm in 642 Baldy or via Zoom (Meeting ID 
> 585 520 2411; Passcode Hoorheh)
> There’s A Crack In Everything - That’s How The Light Gets In
> (Leonard Cohen)
> -- 
> *From: *Lingtyp <lingtyp-bounces at listserv.linguistlist.org> on behalf 
> of Neil Myler <myler at bu.edu>
> *Date: *Saturday, December 10, 2022 at 10:21 PM
> *To: *LINGTYP at listserv.linguistlist.org 
> <LINGTYP at listserv.linguistlist.org>
> *Subject: *Re: [Lingtyp] spectrograms in linguistic description and 
> for language comparison
> I agree with everything here, with one addendum: it's a strawman even 
> if you do ignore more formal judgment experiments. The examples are 
> invented, but each data point is a *pairing* of an example and a 
> judgment. Since the judgments aren't invented (except in cases of 
> misconduct), it's wrong to say that the data are.
> Neil
> On Sat, Dec 10, 2022, 10:05 PM Adam Singerman 
> <adamsingerman at gmail.com> wrote:
>     I think Randy is wrong (sorry if this comes across as blunt) and so I
>     am writing, on a Saturday night no less, to voice a different view.
>     Working inductively from a corpus is great, but no corpus is ever
>     going to be large enough to fully represent a given language's
>     grammatical possibilities. If we limit ourselves to working
>     inductively from corpora then many basic questions about the languages
>     we research will go unanswered. From a corpus of natural data we
>     simply cannot know whether a given pattern is missing because the
>     corpus is finite (i.e., it's just a statistical accident that the
>     pattern isn't attested) or whether there's a genuine reason why the
>     pattern is not showing up (i.e., its non-attestation is principled).
>     When I am writing up my research on Tuparí I always prioritize
>     non-elicited data (texts, in-person conversation, WhatsApp chats). But
>     interpreting and analyzing the non-elicited data requires making
>     reference to acceptability judgments. The prefix (e)tareman- is a
>     negative polarity item, and it always co-occurs with (and inside the
>     scope of) a negator morpheme. But the only way I can make this point
>     is by showing that speakers invariably reject tokens of (e)tareman-
>     without a licensing negator. Those rejected examples are by definition
>     not going to be present in any corpus of naturalistic speech, but they
>     tell me something crucial about what the structure of Tuparí does and
>     does not allow. If I limit myself to inductively working from a
>     corpus, fundamental facts about the prefix (e)tareman- and about
>     negation in Tuparí more broadly will be missed.
>     A lot of recent scholarship has made major strides towards improving
>     the methodology of collecting and interpreting acceptability
>     judgments. The formal semanticists who work on understudied languages
>     (here I am thinking of Judith Tonhauser, Lisa Matthewson, Ryan
>     Bochnak, Amy Rose Deal, Scott AnderBois) are extremely careful about
>     teasing apart utterances that are rejected because of some
>     morphosyntactic ill-formedness (i.e., ungrammaticality) versus ones
>     that are rejected because of semantic or pragmatic oddity. The
>     important point is that such teasing apart can be done, and the
>     descriptions and analyses that result from this work are richer than
>     what would result from a methodology that uses corpus examination or
>     elicitation only.
>     One more example from Tuparí: this language has an obligatory
>     witnessed/non-witnessed evidential distinction, but the deictic
>     orientation of the distinction (to the speaker or to the addressee) is
>     determined via clause type. There is a nuanced set of interactions
>     between the evidential morphology and the clause-typing morphology,
>     and it would have been impossible for me to figure out the basics of
>     those interactions without relying primarily on conversational data
>     and discourse context. But I still needed to get some acceptability
>     judgments to ensure that the picture I'd arrived at wasn't overly
>     biased by the limitations of my corpus. Finding speakers who were
>     willing to work with me on those judgments wasn't always easy; a fair
>     amount of metalinguistic awareness was needed. But it was worth it!
>     The generalizations that I was able to publish were much more solid
>     than if I had worked exclusively from corpus data. And the methodology
>     I learned from the Tonhauser/Matthewson/etc crowd was fundamental to
>     this work.
>     The call to work inductively from corpora would have the practical
>     effect of making certain topics totally inaccessible for research
>     (control vs raising structures, pied-piping, islands, gaps in
>     inflectional paradigms, etc) even though large scale acceptability
>     tasks have shown that these phenomena are "real," i.e., they're not
>     just in the minds of linguists who are using introspection. Randy's
>     point that "no other science allows the scientist to make up his or
>     her own data, and so this is something linguists should give up" is a
>     straw man argument now that many experimentalist syntacticians use
>     large-scale acceptability judgments on platforms like Mechanical Turk
>     to get at speakers' judgments. I think we do a disservice to our
>     students and to junior scholars if we tell them that the only real
>     stuff to be studied will be in the corpora that we assemble. Even the
>     best corpora are finite, whereas L1 speakers' knowledge of their
>     language is infinitely productive.
>     — Adam
>     _______________________________________________
>     Lingtyp mailing list
>     Lingtyp at listserv.linguistlist.org
>     https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp
>     <https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flistserv.linguistlist.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Flingtyp&data=05%7C01%7Cjb77%40buffalo.edu%7C243aea832f624794e4da08dadb26ac49%7C96464a8af8ed40b199e25f6b50a20250%7C0%7C0%7C638063256872055833%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=RktZzmPhH9YRxywpcxrcF8eeMu3t0v5J%2FxNjCPJcx6s%3D&reserved=0>
> _______________________________________________
> Lingtyp mailing list
> Lingtyp at listserv.linguistlist.org
> https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp

Martin Haspelmath
Max Planck Institute for Evolutionary Anthropology
Deutscher Platz 6
D-04103 Leipzig
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20221211/238d641b/attachment.htm>

More information about the Lingtyp mailing list