analysis: unhappiness
    Richard Hudson 
    dick at ling.ucl.ac.uk
       
    Fri Sep 10 23:20:15 UTC 2010
    
    
  
  Dear Brian,
What a helpful message! I think you're right: we need a typology of 
cases, each needing a different range of methods, ranging from the 
linguist's own judgements for really easy cases to more complicated 
quantitative methods for more complicated ones.
The trouble with our discipline is that for any community of N speakers, 
and a language consisting of M 'items' (however you may choose to define 
'community' and 'item'), we have N*M datapoints that, in principle, all 
need to be validated somehow. We might reduce the number by focusing on 
one speaker, but then you can't use data from other speakers as evidence 
for that speaker's language; or we might try to construct a 'typical' 
speaker, but we don't know how to do that; or we might reduce the size 
of the community by trying to find a 'dialect' (but dialects don't 
really exist); or we might ignore most of the linguistic items and focus 
on, say, the modal verbs - but then we miss their links to all the other 
items.
It's different for psycholinguists because they're only interested in 
general processes, for which linguistic items are just evidence, not the 
thing under investigation; but for us linguists, the fine detail is 
everything because we're the people who explore the connections between 
items.
So I look forward to the day when your typology of cases will guide us 
through a range of different methods to the appropriate ones for any 
given item.
Best wishes, Dick
Richard Hudson www.phon.ucl.ac.uk/home/dick/home.htm
On 10/09/2010 23:23, Brian MacWhinney wrote:
> Dick and Ted,
>     I agree with parts of what each of you are saying.  Which means that I also disagree with other parts.   In practice,, Gibson and Fedorenko, in press,  (which I downloaded and scanned) deals with no more than two or three constructions.  They mention the fact that people don't have problems with sentences such as "Susan muttered him the news" despite claims that verbs such as "mutter" cannot take the double object construction.  They also note that the claims from Jackendoff and Culicover about the differences between the two sentences below are not supported by results from the Mechanical Turk:
> 1.  Peter was trying to remember who carried what.
> 2.  Peter was trying to remember who carried what when.
> These are interesting facts.  If these sentences are supposed to be different and people judge them to be similarly grammatical, then theories based on the supposed differences should be reexamined.  There are big chunks of syntactic theory resting on shaky judgments about complex sentences of this type.  Getting some of this straight would be a big win, I would say, particularly if linguists would pay attention to the results.
>       But I understand Dick's worry about how far Gibson and Fedorenko are trying to push this.  Neither their email nor their paper sets clear limits on what we should be testing and we certainly don't want to waste time checking out  go-goed-went.  So, Gibson and Fedorenko owe us those clarifications.
>      But, Dick, you then move on to questioning data on bid-bidded.  Here we have a case of true variation in the population.  I would love to know its distribution.  As a "fan of quantitative sociolinguistics" shouldn't you too?
>      My take on this is that constructions are not created equal.  The three types mentioned here are probably just a start on an inventory of evidentiary types.  We need to correctly pair up appropriate methods with each of the types.  And we to make sure that people pay attention to the results, once they are in
>
> --Brian MacWhinney
>
> On Sep 10, 2010, at 1:59 PM, Richard Hudson wrote:
>
>> Dear Ted,
>> Thanks for the very interesting comment, but are you REALLY saying that I shouldn't claim, for example, that the past tense of GO is "went" without first cross-checking with 50 native speakers?
>>
>> Isn't there a danger of missing the point that we all, as native speakers, spend our whole lives scanning other people's linguistic behaviour (language 'out there', E-language) and trying to explain it to ourselves in terms of a language system (language 'in here', I-language)? So every judgement we make is based on thousands or millions of observed exemplars, and reflects a unique experience of E-language filtered through a unique I-language.
>>
>> Given that view of language development, I don't see how quantitative data will help. Let's take a real uncertainty, such as the past tense of BID. If I want to say I did it, do I say "I bidded" or "I bid"? My judgement: I don't know. Ok, you get 50 people to oblige on Mechanical Turk, and 20 of them give "bidded" and 30 "bid". So what? Does that mean that the correct answer is "bidded"? Surely not. How is it better than my judgement? I agree you could record my speech and find how often I use each alternative; but the reason I don't know is precisely because it's a rare word, so in a sense quantitative data are irrelevant even there. What would solve the problem of subjectivity, of course, would be a machine for probing the bit of my mind (or even brain) that holds BID and its details; but I suspect that even that wouldn't move us much further forward than my original "don't know". (Incidentally I write as a fan of quantitative sociolinguistics, so I do accept that quantitative data are relevant to linguistic analysis in some areas, where the I-language phenomenon is frequent enough to produce usable data.)
>>
>> It seems to me that this discussion raises the really fundamental question of what kind of thing we think language is: social or individual. The problem isn't unique to linguistics of course; it's the same throughout the social sciences. But what's special about linguistics is that we deal in very fine details of culture (e.g. details of how a particular word is used or pronounced) so the differences between individuals really matter. I don't see that we're ever going to have anything better than judgements to go on, so what we need is a way to ensure that judgements are accurate reports of individual I-language. A rotten situation for a science, but I don't see how it can get better.
>>
>> Dick
>>
>> Richard Hudson www.phon.ucl.ac.uk/home/dick/home.htm
>>
>> On 10/09/2010 14:03, Ted Gibson wrote:
>>> Dear Dan, Dick:
>>>
>>> I would like to clarify some points that Dan Everett makes, in
>>> response to Dick Hudson.
>>>
>>> Ev Fedorenko and I have written a couple of papers recently (Gibson&
>>> Fedorenko, 2010, in press, see references and links below) on what we
>>> think are weak methodological standards in syntax and semantics
>>> research over the past many years. The issue that we address is the
>>> prevalent method in syntax and semantics research, which involves
>>> obtaining a judgment of the acceptability of a sentence / meaning
>>> pair, typically by just the author of the paper, sometimes with
>>> feedback from colleagues. As we address in our papers, this
>>> methodology does not allow proper testing of scientific hypotheses
>>> because of (a) the small number of experimental participants
>>> (typically one); (b) the small number of experimental stimuli
>>> (typically one); (c) cognitive biases on the part of the researcher
>>> and participants; and (d) the effect of the preceding context (e.g.,
>>> other constructions the researcher may have been recently
>>> considering). (As Dan said, see Schutze, 1996; Cowart, 1997; and
>>> several others cited in Gibson&  Fedorenko, in press; for similar
>>> points, but with not as strong a conclusion as ours).
>>>
>>> Three issues need to be separated here: (1) the use of intuitive
>>> judgments as a dependent measure in a language experiment; (2)
>>> potential cognitive biases on the part of experimental subjects and
>>> experimenters in language experiments; and (3) the need for obtaining
>>> quantitative evidence, whatever the dependent measure might be. The
>>> paper that Ev and I wrote addresses the last two issues, but does not
>>> go into depth on the first issue (the use of intuitions as a dependent
>>> measure in language experiments). Regarding this issue, we don't think
>>> that there is anything wrong with gathering intuitive judgments as a
>>> dependent measure, as long as the task is clear to the experimental
>>> participants.
>>>
>>> In the longer paper (Gibson&  Fedorenko, in press) we respond to some
>>> arguments that have been given in support of continuing to use the
>>> traditional non-quantitative method in syntax / semantics research.
>>> One recent defense of the traditional method comes from Phillips
>>> (2008), who argues that no harm has come from the non-quantitative
>>> approach in syntax research thus far. Phillips argues that there are
>>> no cases in the literature where an incorrect intuitive judgment has
>>> become the basis for a widely accepted generalization or an important
>>> theoretical claim. He therefore concludes that there is no reason to
>>> adopt more rigorous data collection standards. We challenge Philips’
>>> conclusion by presenting three cases from the literature where a
>>> faulty intuition has led to incorrect generalizations and mistaken
>>> theorizing, plausibly due to cognitive biases on the part of the
>>> researchers.
>>>
>>> A second argument that is sometimes presented for the continued use of
>>> the traditional non-quantitative method is that it would be too
>>> inefficient to evaluate every syntactic / semantic hypothesis or
>>> phenomenon quantitatively. For example, Culicover&  Jackendoff (2010)
>>> make this argument explicitly in their response to Gibson&  Fedorenko
>>> (2010): “It would cripple linguistic investigation if it were required
>>> that all judgments of ambiguity and grammaticality be subject to
>>> statistically rigorous experiments on naive subjects, especially when
>>> investigating languages whose speakers are hard to access” (Culicover
>>> &  Jackendoff, 2010, p. 234). (Dick Hudson makes a similar point
>>> earlier in the discussion here.) Whereas we agree that in
>>> circumstances where gathering data is difficult, some evidence is
>>> better than no evidence, we do not agree that research would be slowed
>>> with respect to languages where experimental participants are easy to
>>> access, such as English. In contrast, we think that the opposite is
>>> true: the field’s progress is probably slowed by not doing
>>> quantitative research.
>>> Suppose that a typical syntax / semantics paper that lacks
>>> quantitative evidence includes judgments for 50 or more sentences /
>>> meaning pairs, corresponding to 50 or more empirical claims. Even if
>>> most of the judgments from such a paper are correct or are on the
>>> right track, the problem is in knowing which judgments are correct.
>>> For example, suppose that 90% of the judgments from an arbitrary paper
>>> are correct (which is probably a high estimate). (Colin Phillips and
>>> some of his former students / postdocs have commented to us that, in
>>> their experience, quantitative acceptability judgment studies almost
>>> always validate the claim(s) in the literature. This is not our
>>> experience, however. Most experiments that we have run which attempt
>>> to test some syntactic / semantic hypothesis in the literature end up
>>> providing us with a pattern of data that had not been known before the
>>> experiment (e.g., Breen et al., in press; Fedorenko&  Gibson, in
>>> press; Patel et al., 2009; Scontras&  Gibson, submitted).) This means
>>> that in a paper with 50 empirical claims 45/50 are correct. But which
>>> 45? There are 2,118, 760 ways to choose 45 items from 50. That’s over
>>> two million different theories. By quantitatively evaluating the
>>> empirical claims, we reduce the uncertainty a great deal. To make
>>> progress, it is better to have theoretical claims supported by solid
>>> quantitative evidence, so that even if the interpretation of the data
>>> changes over time as new evidence becomes available – as is often the
>>> case in any field of science – the empirical pattern can be used as a
>>> basis for further theorizing.
>>>
>>> Furthermore, it is no longer expensive to run behavioral experiments,
>>> at least in English and other widely spoken languages. There now
>>> exists a marketplace interface – Amazon.com’s Mechanical Turk – which
>>> can be used for collecting behavioral data over the internet quickly
>>> and inexpensively. The cost of using an interface like this is
>>> minimal, and the time that it takes for the results to be returned is
>>> short. For example, currently on Mechanical Turk, a survey of
>>> approximately 50 items will be answered by 50 or more participants
>>> within a couple of hours, at a cost of approximately $1 per
>>> participant. Thus a survey can be completed within a day, at a cost of
>>> less than $50. (The hard work of designing the experiment, and
>>> constructing controlled materials remains of course.)
>>>
>>> Sorry to be so verbose. But I think that these methodological points
>>> are very important.
>>>
>>> Best wishes,
>>>
>>> Ted Gibson
>>>
>>> Gibson, E.&  Fedorenko, E. (In press). The need for quantitative
>>> methods in syntax and semantics research. Language and Cognitive
>>> Processes. http://tedlab.mit.edu/tedlab_website/researchpapers/Gibson
>>> &  Fedorenko InPress LCP.pdf
>>>
>>> Gibson, E.&  Fedorenko, E. (2010). Weak quantitative standards in
>>> linguistics research. Trends in Cognitive Science, 14, 233-234.
>>> http://tedlab.mit.edu/tedlab_website/researchpapers/Gibson&  Fedorenko
>>> 2010 TICS.pdf
>>>
>>>
>>>
>>>
>>>> Dick,
>>>>
>>>> You raise an important issue here about methodology. I believe that
>>>> intuitions are a fine way to generate hypotheses and even to test
>>>> them - to a degree. But while it might not have been feasible for
>>>> Huddleston, Pullum, and the other contributors to the Cambridge
>>>> Grammar to conduct experiments on every point of the grammar,
>>>> experiments could have only made the grammar better. The use of
>>>> intuitions, corpora, and standard psycholinguistic experimentation
>>>> (indeed, Standard Social Science Methodology) is vital for taking the
>>>> field forward and for providing the best support for different
>>>> analyses. Ted Gibson and Ev Fedorenko have written a very useful new
>>>> paper on this, showing serious shortcomings with intuitions as the
>>>> sole source of evidence, in their paper: "The need for quantitative
>>>> methods in syntax and semantics research".
>>>>
>>>> Carson Schutze and Wayne Cowart, among others, have also written
>>>> convincingly on this.
>>>>
>>>> It is one reason that a team from Stanford, MIT (Brain and Cognitive
>>>> Science), and researchers from Brazil are beginning a third round of
>>>> experimental work among the Pirahas, since my own work on the syntax
>>>> was, like almost every other field researcher's, based on native
>>>> speaker intuitions and corpora.
>>>>
>>>> The discussion of methodologies reminds me of the initial reactions
>>>> to Greenberg's work on classifying the languages of the Americas. His
>>>> methods were strongly (and justifiably) criticized. However, I always
>>>> thought that his methods were a great way of generating hypotheses,
>>>> so long as they were ultimately put to the test of standard
>>>> historical linguistics methods. And the same seems true for use of
>>>> native-speaker intuitions.
>>>>
>>>> -- Dan
>>>
>>>
>>>>> We linguists can add a further layer of explanation to the
>>>>> judgements, but some judgements do seem to be more reliable than
>>>>> others. And if we have to wait for psycholinguistic evidence for
>>>>> every detailed analysis we make, our whole discipline will
>>>>> immediately grind to a halt. Like it or not, native speaker
>>>>> judgements are what put us linguists ahead of the rest in handling
>>>>> fine detail. Imagine writing the Cambridge Grammar of the English
>>>>> Language (or the OED) without using native speaker judgements.
>>>>>
>>>>> Best wishes, Dick Hudson
>>>
>>>
>
>
    
    
More information about the Funknet
mailing list