<div dir="auto"><div dir="auto"></div>I agree with everything here, with one addendum: it's a strawman even if you do ignore more formal judgment experiments. The examples are invented, but each data point is a *pairing* of an example and a judgment. Since the judgments aren't invented (except in cases of misconduct), it's wrong to say that the data are.<div dir="auto">Neil<br><div class="gmail_quote" dir="auto"><div dir="ltr" class="gmail_attr">On Sat, Dec 10, 2022, 10:05 PM Adam Singerman <<a href="mailto:adamsingerman@gmail.com" target="_blank" rel="noreferrer">adamsingerman@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I think Randy is wrong (sorry if this comes across as blunt) and so I<br>
am writing, on a Saturday night no less, to voice a different view.<br>
<br>
Working inductively from a corpus is great, but no corpus is ever<br>
going to be large enough to fully represent a given language's<br>
grammatical possibilities. If we limit ourselves to working<br>
inductively from corpora then many basic questions about the languages<br>
we research will go unanswered. From a corpus of natural data we<br>
simply cannot know whether a given pattern is missing because the<br>
corpus is finite (i.e., it's just a statistical accident that the<br>
pattern isn't attested) or whether there's a genuine reason why the<br>
pattern is not showing up (i.e., its non-attestation is principled).<br>
<br>
When I am writing up my research on Tuparí I always prioritize<br>
non-elicited data (texts, in-person conversation, WhatsApp chats). But<br>
interpreting and analyzing the non-elicited data requires making<br>
reference to acceptability judgments. The prefix (e)tareman- is a<br>
negative polarity item, and it always co-occurs with (and inside the<br>
scope of) a negator morpheme. But the only way I can make this point<br>
is by showing that speakers invariably reject tokens of (e)tareman-<br>
without a licensing negator. Those rejected examples are by definition<br>
not going to be present in any corpus of naturalistic speech, but they<br>
tell me something crucial about what the structure of Tuparí does and<br>
does not allow. If I limit myself to inductively working from a<br>
corpus, fundamental facts about the prefix (e)tareman- and about<br>
negation in Tuparí more broadly will be missed.<br>
<br>
A lot of recent scholarship has made major strides towards improving<br>
the methodology of collecting and interpreting acceptability<br>
judgments. The formal semanticists who work on understudied languages<br>
(here I am thinking of Judith Tonhauser, Lisa Matthewson, Ryan<br>
Bochnak, Amy Rose Deal, Scott AnderBois) are extremely careful about<br>
teasing apart utterances that are rejected because of some<br>
morphosyntactic ill-formedness (i.e., ungrammaticality) versus ones<br>
that are rejected because of semantic or pragmatic oddity. The<br>
important point is that such teasing apart can be done, and the<br>
descriptions and analyses that result from this work are richer than<br>
what would result from a methodology that uses corpus examination or<br>
elicitation only.<br>
<br>
One more example from Tuparí: this language has an obligatory<br>
witnessed/non-witnessed evidential distinction, but the deictic<br>
orientation of the distinction (to the speaker or to the addressee) is<br>
determined via clause type. There is a nuanced set of interactions<br>
between the evidential morphology and the clause-typing morphology,<br>
and it would have been impossible for me to figure out the basics of<br>
those interactions without relying primarily on conversational data<br>
and discourse context. But I still needed to get some acceptability<br>
judgments to ensure that the picture I'd arrived at wasn't overly<br>
biased by the limitations of my corpus. Finding speakers who were<br>
willing to work with me on those judgments wasn't always easy; a fair<br>
amount of metalinguistic awareness was needed. But it was worth it!<br>
The generalizations that I was able to publish were much more solid<br>
than if I had worked exclusively from corpus data. And the methodology<br>
I learned from the Tonhauser/Matthewson/etc crowd was fundamental to<br>
this work.<br>
<br>
The call to work inductively from corpora would have the practical<br>
effect of making certain topics totally inaccessible for research<br>
(control vs raising structures, pied-piping, islands, gaps in<br>
inflectional paradigms, etc) even though large scale acceptability<br>
tasks have shown that these phenomena are "real," i.e., they're not<br>
just in the minds of linguists who are using introspection. Randy's<br>
point that "no other science allows the scientist to make up his or<br>
her own data, and so this is something linguists should give up" is a<br>
straw man argument now that many experimentalist syntacticians use<br>
large-scale acceptability judgments on platforms like Mechanical Turk<br>
to get at speakers' judgments. I think we do a disservice to our<br>
students and to junior scholars if we tell them that the only real<br>
stuff to be studied will be in the corpora that we assemble. Even the<br>
best corpora are finite, whereas L1 speakers' knowledge of their<br>
language is infinitely productive.<br>
<br>
— Adam<br>
_______________________________________________<br>
Lingtyp mailing list<br>
<a href="mailto:Lingtyp@listserv.linguistlist.org" rel="noreferrer noreferrer" target="_blank">Lingtyp@listserv.linguistlist.org</a><br>
<a href="https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp" rel="noreferrer noreferrer noreferrer" target="_blank">https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp</a><br>
</blockquote></div></div></div>