<div dir="auto"><div dir="auto"></div>I agree with everything here, with one addendum: it's a strawman even if you do ignore more formal judgment experiments.  The examples are invented, but each data point is a *pairing* of an example and a judgment. Since the judgments aren't invented (except in cases of misconduct), it's wrong to say that the data are.<div dir="auto">Neil<br><div class="gmail_quote" dir="auto"><div dir="ltr" class="gmail_attr">On Sat, Dec 10, 2022, 10:05 PM Adam Singerman <<a href="mailto:adamsingerman@gmail.com" target="_blank" rel="noreferrer">adamsingerman@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I think Randy is wrong (sorry if this comes across as blunt) and so I<br>

am writing, on a Saturday night no less, to voice a different view.<br>

<br>

Working inductively from a corpus is great, but no corpus is ever<br>

going to be large enough to fully represent a given language's<br>

grammatical possibilities. If we limit ourselves to working<br>

inductively from corpora then many basic questions about the languages<br>

we research will go unanswered. From a corpus of natural data we<br>

simply cannot know whether a given pattern is missing because the<br>

corpus is finite (i.e., it's just a statistical accident that the<br>

pattern isn't attested) or whether there's a genuine reason why the<br>

pattern is not showing up (i.e., its non-attestation is principled).<br>

<br>

When I am writing up my research on Tuparí I always prioritize<br>

non-elicited data (texts, in-person conversation, WhatsApp chats). But<br>

interpreting and analyzing the non-elicited data requires making<br>

reference to acceptability judgments. The prefix (e)tareman- is a<br>

negative polarity item, and it always co-occurs with (and inside the<br>

scope of) a negator morpheme. But the only way I can make this point<br>

is by showing that speakers invariably reject tokens of (e)tareman-<br>

without a licensing negator. Those rejected examples are by definition<br>

not going to be present in any corpus of naturalistic speech, but they<br>

tell me something crucial about what the structure of Tuparí does and<br>

does not allow. If I limit myself to inductively working from a<br>

corpus, fundamental facts about the prefix (e)tareman- and about<br>

negation in Tuparí more broadly will be missed.<br>

<br>

A lot of recent scholarship has made major strides towards improving<br>

the methodology of collecting and interpreting acceptability<br>

judgments. The formal semanticists who work on understudied languages<br>

(here I am thinking of Judith Tonhauser, Lisa Matthewson, Ryan<br>

Bochnak, Amy Rose Deal, Scott AnderBois) are extremely careful about<br>

teasing apart utterances that are rejected because of some<br>

morphosyntactic ill-formedness (i.e., ungrammaticality) versus ones<br>

that are rejected because of semantic or pragmatic oddity. The<br>

important point is that such teasing apart can be done, and the<br>

descriptions and analyses that result from this work are richer than<br>

what would result from a methodology that uses corpus examination or<br>

elicitation only.<br>

<br>

One more example from Tuparí: this language has an obligatory<br>

witnessed/non-witnessed evidential distinction, but the deictic<br>

orientation of the distinction (to the speaker or to the addressee) is<br>

determined via clause type. There is a nuanced set of interactions<br>

between the evidential morphology and the clause-typing morphology,<br>

and it would have been impossible for me to figure out the basics of<br>

those interactions without relying primarily on conversational data<br>

and discourse context. But I still needed to get some acceptability<br>

judgments to ensure that the picture I'd arrived at wasn't overly<br>

biased by the limitations of my corpus. Finding speakers who were<br>

willing to work with me on those judgments wasn't always easy; a fair<br>

amount of metalinguistic awareness was needed. But it was worth it!<br>

The generalizations that I was able to publish were much more solid<br>

than if I had worked exclusively from corpus data. And the methodology<br>

I learned from the Tonhauser/Matthewson/etc crowd was fundamental to<br>

this work.<br>

<br>

The call to work inductively from corpora would have the practical<br>

effect of making certain topics totally inaccessible for research<br>

(control vs raising structures, pied-piping, islands, gaps in<br>

inflectional paradigms, etc) even though large scale acceptability<br>

tasks have shown that these phenomena are "real," i.e., they're not<br>

just in the minds of linguists who are using introspection. Randy's<br>

point that "no other science allows the scientist to make up his or<br>

her own data, and so this is something linguists should give up" is a<br>

straw man argument now that many experimentalist syntacticians use<br>

large-scale acceptability judgments on platforms like Mechanical Turk<br>

to get at speakers' judgments. I think we do a disservice to our<br>

students and to junior scholars if we tell them that the only real<br>

stuff to be studied will be in the corpora that we assemble. Even the<br>

best corpora are finite, whereas L1 speakers' knowledge of their<br>

language is infinitely productive.<br>

<br>

— Adam<br>

_______________________________________________<br>

Lingtyp mailing list<br>

<a href="mailto:Lingtyp@listserv.linguistlist.org" rel="noreferrer noreferrer" target="_blank">Lingtyp@listserv.linguistlist.org</a><br>

<a href="https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp" rel="noreferrer noreferrer noreferrer" target="_blank">https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp</a><br>

</blockquote></div></div></div>