analysis: unhappiness
Richard Hudson
dick at ling.ucl.ac.uk
Fri Sep 10 23:40:06 UTC 2010
Dear Ted and Ev,
Yes, I understand your view, but I think it's a psycholinguist's view.
Your goal is to find general processes and principles that apply
uniformly across individuals, so you have to use methods to check for
generality. And (as you know) I admire the way you pursue that goal. But
my goal, as a linguist, is different. I want to explore the structure of
a language so that I can understand how all the bits fit together. Like
you, I'm aiming to model cognition, but my focus is on items and
structures, and I start from the assumption that these can and do vary
across speakers.
However, having said all that I do agree with you that linguists should
all get used to collecting and using quantitative data; and with the
help of Brian MacWhinney's typology we'd know what methods to use when.
And I do agree with your points about bid/bidded: in cases like that,
quantitative data would be at least a very good starting point for a
proper investigation.
Best wishes, Dick
Richard Hudson www.phon.ucl.ac.uk/home/dick/home.htm
On 10/09/2010 19:30, Ted Gibson wrote:
> Dear Dick:
>
> Perhaps we are talking at cross purposes. I don't understand what is
> confusing about what Ev Fedorenko and I are claiming. All we are
> saying is that if you have some testable claim involving a general
> hypothesis about a language, then you need to get quantitative data
> from unbiased sources to evaluate that claim. If you are interested in
> English past tense morphology, then depending on the claims that you
> might want to investigate, there are lots of ways to get relevant
> quantitative evidence. Corpus data will probably be useful. For very
> low frequency words, you can run experiments to test behavior with
> respect to such words.
>
> Your example of the past tense of "bid" is a fine such example. You
> can run an experiment like the one you suggested to find out what
> people think the past tense is. If you then found that 20/50 people
> responded "bidded" and 30/50 respond "bid", that is a lot of useful
> information. As you suggest in your discussion, this result wouldn't
> answer the question of how past tense is stored in each individual.
> This result would be ambiguous among several possible explanations.
> One possibility is that the probability distribution that is being
> discovered reflects different dialects, such that 2/5 of the
> population has one past tense, and 3/5 has another. Another
> possibility is that each person has a similar probability distribution
> in their heads, such that 2/5 of the time I respond one way, and 3/5
> of the time I respond another. Further experiments would be necessary
> to answer between these and other possible theories (e.g., with
> repeated trials from the same person, carefully planned so that the
> participants don't notice that they are being asked multiple times).
> Without the quantitative evidence in the first place, there is no way
> to answer these kinds of questions.
>
> Regarding the past tense of "go", this would be useful as a baseline
> in an experiment involving the less frequent ones. So, yes, it would
> useful to gather quantitative evidence in such a case also, as
> baselines with respect to the more interesting cases for theories.
>
> The bottom line: if you have a generalization about a language that
> you wish to evaluate (such that you hypothesize that it is true across
> the speakers of the language), then you need quantitative evidence
> from multiple individuals, using an unbiased data collection method,
> to evaluate such a claim. The point about Mechanical Turk is that it
> is really *easy* to do this now, at least for languages like English.
>
> Best wishes,
>
> Ted Gibson & Ev Fedorenko
>
> On Sep 10, 2010, at 1:59 PM, Richard Hudson wrote:
>
>> Dear Ted,
>> Thanks for the very interesting comment, but are you REALLY saying
>> that I shouldn't claim, for example, that the past tense of GO is
>> "went" without first cross-checking with 50 native speakers?
>>
>> Isn't there a danger of missing the point that we all, as native
>> speakers, spend our whole lives scanning other people's linguistic
>> behaviour (language 'out there', E-language) and trying to explain it
>> to ourselves in terms of a language system (language 'in here',
>> I-language)? So every judgement we make is based on thousands or
>> millions of observed exemplars, and reflects a unique experience of
>> E-language filtered through a unique I-language.
>>
>> Given that view of language development, I don't see how quantitative
>> data will help. Let's take a real uncertainty, such as the past tense
>> of BID. If I want to say I did it, do I say "I bidded" or "I bid"? My
>> judgement: I don't know. Ok, you get 50 people to oblige on
>> Mechanical Turk, and 20 of them give "bidded" and 30 "bid". So what?
>> Does that mean that the correct answer is "bidded"? Surely not. How
>> is it better than my judgement? I agree you could record my speech
>> and find how often I use each alternative; but the reason I don't
>> know is precisely because it's a rare word, so in a sense
>> quantitative data are irrelevant even there. What would solve the
>> problem of subjectivity, of course, would be a machine for probing
>> the bit of my mind (or even brain) that holds BID and its details;
>> but I suspect that even that wouldn't move us much further forward
>> than my original "don't know". (Incidentally I write as a fan of
>> quantitative sociolinguistics, so I do accept that quantitative data
>> are relevant to linguistic analysis in some areas, where the
>> I-language phenomenon is frequent enough to produce usable data.)
>>
>> It seems to me that this discussion raises the really fundamental
>> question of what kind of thing we think language is: social or
>> individual. The problem isn't unique to linguistics of course; it's
>> the same throughout the social sciences. But what's special about
>> linguistics is that we deal in very fine details of culture (e.g.
>> details of how a particular word is used or pronounced) so the
>> differences between individuals really matter. I don't see that we're
>> ever going to have anything better than judgements to go on, so what
>> we need is a way to ensure that judgements are accurate reports of
>> individual I-language. A rotten situation for a science, but I don't
>> see how it can get better.
>>
>> Dick
>>
>> Richard Hudson www.phon.ucl.ac.uk/home/dick/home.htm
>>
>> On 10/09/2010 14:03, Ted Gibson wrote:
>>> Dear Dan, Dick:
>>>
>>> I would like to clarify some points that Dan Everett makes, in
>>> response to Dick Hudson.
>>>
>>> Ev Fedorenko and I have written a couple of papers recently (Gibson &
>>> Fedorenko, 2010, in press, see references and links below) on what we
>>> think are weak methodological standards in syntax and semantics
>>> research over the past many years. The issue that we address is the
>>> prevalent method in syntax and semantics research, which involves
>>> obtaining a judgment of the acceptability of a sentence / meaning
>>> pair, typically by just the author of the paper, sometimes with
>>> feedback from colleagues. As we address in our papers, this
>>> methodology does not allow proper testing of scientific hypotheses
>>> because of (a) the small number of experimental participants
>>> (typically one); (b) the small number of experimental stimuli
>>> (typically one); (c) cognitive biases on the part of the researcher
>>> and participants; and (d) the effect of the preceding context (e.g.,
>>> other constructions the researcher may have been recently
>>> considering). (As Dan said, see Schutze, 1996; Cowart, 1997; and
>>> several others cited in Gibson & Fedorenko, in press; for similar
>>> points, but with not as strong a conclusion as ours).
>>>
>>> Three issues need to be separated here: (1) the use of intuitive
>>> judgments as a dependent measure in a language experiment; (2)
>>> potential cognitive biases on the part of experimental subjects and
>>> experimenters in language experiments; and (3) the need for obtaining
>>> quantitative evidence, whatever the dependent measure might be. The
>>> paper that Ev and I wrote addresses the last two issues, but does not
>>> go into depth on the first issue (the use of intuitions as a dependent
>>> measure in language experiments). Regarding this issue, we don't think
>>> that there is anything wrong with gathering intuitive judgments as a
>>> dependent measure, as long as the task is clear to the experimental
>>> participants.
>>>
>>> In the longer paper (Gibson & Fedorenko, in press) we respond to some
>>> arguments that have been given in support of continuing to use the
>>> traditional non-quantitative method in syntax / semantics research.
>>> One recent defense of the traditional method comes from Phillips
>>> (2008), who argues that no harm has come from the non-quantitative
>>> approach in syntax research thus far. Phillips argues that there are
>>> no cases in the literature where an incorrect intuitive judgment has
>>> become the basis for a widely accepted generalization or an important
>>> theoretical claim. He therefore concludes that there is no reason to
>>> adopt more rigorous data collection standards. We challenge Philips’
>>> conclusion by presenting three cases from the literature where a
>>> faulty intuition has led to incorrect generalizations and mistaken
>>> theorizing, plausibly due to cognitive biases on the part of the
>>> researchers.
>>>
>>> A second argument that is sometimes presented for the continued use of
>>> the traditional non-quantitative method is that it would be too
>>> inefficient to evaluate every syntactic / semantic hypothesis or
>>> phenomenon quantitatively. For example, Culicover & Jackendoff (2010)
>>> make this argument explicitly in their response to Gibson & Fedorenko
>>> (2010): “It would cripple linguistic investigation if it were required
>>> that all judgments of ambiguity and grammaticality be subject to
>>> statistically rigorous experiments on naive subjects, especially when
>>> investigating languages whose speakers are hard to access” (Culicover
>>> & Jackendoff, 2010, p. 234). (Dick Hudson makes a similar point
>>> earlier in the discussion here.) Whereas we agree that in
>>> circumstances where gathering data is difficult, some evidence is
>>> better than no evidence, we do not agree that research would be slowed
>>> with respect to languages where experimental participants are easy to
>>> access, such as English. In contrast, we think that the opposite is
>>> true: the field’s progress is probably slowed by not doing
>>> quantitative research.
>>> Suppose that a typical syntax / semantics paper that lacks
>>> quantitative evidence includes judgments for 50 or more sentences /
>>> meaning pairs, corresponding to 50 or more empirical claims. Even if
>>> most of the judgments from such a paper are correct or are on the
>>> right track, the problem is in knowing which judgments are correct.
>>> For example, suppose that 90% of the judgments from an arbitrary paper
>>> are correct (which is probably a high estimate). (Colin Phillips and
>>> some of his former students / postdocs have commented to us that, in
>>> their experience, quantitative acceptability judgment studies almost
>>> always validate the claim(s) in the literature. This is not our
>>> experience, however. Most experiments that we have run which attempt
>>> to test some syntactic / semantic hypothesis in the literature end up
>>> providing us with a pattern of data that had not been known before the
>>> experiment (e.g., Breen et al., in press; Fedorenko & Gibson, in
>>> press; Patel et al., 2009; Scontras & Gibson, submitted).) This means
>>> that in a paper with 50 empirical claims 45/50 are correct. But which
>>> 45? There are 2,118, 760 ways to choose 45 items from 50. That’s over
>>> two million different theories. By quantitatively evaluating the
>>> empirical claims, we reduce the uncertainty a great deal. To make
>>> progress, it is better to have theoretical claims supported by solid
>>> quantitative evidence, so that even if the interpretation of the data
>>> changes over time as new evidence becomes available – as is often the
>>> case in any field of science – the empirical pattern can be used as a
>>> basis for further theorizing.
>>>
>>> Furthermore, it is no longer expensive to run behavioral experiments,
>>> at least in English and other widely spoken languages. There now
>>> exists a marketplace interface – Amazon.com’s Mechanical Turk – which
>>> can be used for collecting behavioral data over the internet quickly
>>> and inexpensively. The cost of using an interface like this is
>>> minimal, and the time that it takes for the results to be returned is
>>> short. For example, currently on Mechanical Turk, a survey of
>>> approximately 50 items will be answered by 50 or more participants
>>> within a couple of hours, at a cost of approximately $1 per
>>> participant. Thus a survey can be completed within a day, at a cost of
>>> less than $50. (The hard work of designing the experiment, and
>>> constructing controlled materials remains of course.)
>>>
>>> Sorry to be so verbose. But I think that these methodological points
>>> are very important.
>>>
>>> Best wishes,
>>>
>>> Ted Gibson
>>>
>>> Gibson, E. & Fedorenko, E. (In press). The need for quantitative
>>> methods in syntax and semantics research. Language and Cognitive
>>> Processes. http://tedlab.mit.edu/tedlab_website/researchpapers/Gibson
>>> & Fedorenko InPress LCP.pdf
>>>
>>> Gibson, E. & Fedorenko, E. (2010). Weak quantitative standards in
>>> linguistics research. Trends in Cognitive Science, 14, 233-234.
>>> http://tedlab.mit.edu/tedlab_website/researchpapers/Gibson & Fedorenko
>>> 2010 TICS.pdf
>>>
>>>
>>>
>>>
>>>> Dick,
>>>>
>>>> You raise an important issue here about methodology. I believe that
>>>> intuitions are a fine way to generate hypotheses and even to test
>>>> them - to a degree. But while it might not have been feasible for
>>>> Huddleston, Pullum, and the other contributors to the Cambridge
>>>> Grammar to conduct experiments on every point of the grammar,
>>>> experiments could have only made the grammar better. The use of
>>>> intuitions, corpora, and standard psycholinguistic experimentation
>>>> (indeed, Standard Social Science Methodology) is vital for taking the
>>>> field forward and for providing the best support for different
>>>> analyses. Ted Gibson and Ev Fedorenko have written a very useful new
>>>> paper on this, showing serious shortcomings with intuitions as the
>>>> sole source of evidence, in their paper: "The need for quantitative
>>>> methods in syntax and semantics research".
>>>>
>>>> Carson Schutze and Wayne Cowart, among others, have also written
>>>> convincingly on this.
>>>>
>>>> It is one reason that a team from Stanford, MIT (Brain and Cognitive
>>>> Science), and researchers from Brazil are beginning a third round of
>>>> experimental work among the Pirahas, since my own work on the syntax
>>>> was, like almost every other field researcher's, based on native
>>>> speaker intuitions and corpora.
>>>>
>>>> The discussion of methodologies reminds me of the initial reactions
>>>> to Greenberg's work on classifying the languages of the Americas. His
>>>> methods were strongly (and justifiably) criticized. However, I always
>>>> thought that his methods were a great way of generating hypotheses,
>>>> so long as they were ultimately put to the test of standard
>>>> historical linguistics methods. And the same seems true for use of
>>>> native-speaker intuitions.
>>>>
>>>> -- Dan
>>>
>>>
>>>
>>>>> We linguists can add a further layer of explanation to the
>>>>> judgements, but some judgements do seem to be more reliable than
>>>>> others. And if we have to wait for psycholinguistic evidence for
>>>>> every detailed analysis we make, our whole discipline will
>>>>> immediately grind to a halt. Like it or not, native speaker
>>>>> judgements are what put us linguists ahead of the rest in handling
>>>>> fine detail. Imagine writing the Cambridge Grammar of the English
>>>>> Language (or the OED) without using native speaker judgements.
>>>>>
>>>>> Best wishes, Dick Hudson
>>>
>>>
>>>
>
>
>
More information about the Funknet
mailing list