analysis: unhappiness

Sat Sep 11 15:45:27 UTC 2010

Dear Brian, Dick, Dan et al:

Thanks for the discussion.  Here are a few responses:

1. Brian:

"But I understand Dick's worry about how far Gibson and Fedorenko are  
trying to push this.  Neither their email nor their paper sets clear  
limits on what we should be testing and we certainly don't want to  
waste time checking out  go-goed-went.  So, Gibson and Fedorenko owe  
us those clarifications."

The answer that we give to this question in Gibson & Fedorenko (in  
press) is as follows (the final paragraph in the paper):

"Finally, a question that is often put to us is whether it is  
necessary to evaluate every empirical claim quantitatively.  A major  
problem with the fields of syntax and semantics is that many papers  
include no quantitative evidence in support of their research  
hypotheses.  Because conducting experiments is now so easy to do with  
the advent of Amazon.com’s Mechanical Turk, we recommend gathering  
quantitative evidence for all empirical claims.  However, it would  
clearly be a vast improvement to the field for all research papers to  
include at least some quantitative evidence evaluating their research  
hypotheses."

Another possible answer to this question is: the more important some  
observation is, the better your evidence should be.  If the  
observation is a key reason for some important theoretical claim, then  
there should be solid quantitative  data supporting that observation.

In practice, once a linguist starts gathering quantitative data, s/he  
will realize (a) how easy it is to do; and (b) how beneficial the  
methods are, with the consequence that these researchers will probably  
do most or all of their work quantitatively in the future.

2. Dick (by the way, thank you for the kind responses, and your  
positive tone):

"Your [the psycholinguists'] goal is to find general processes and  
principles that apply uniformly across individuals, so you have to use  
methods to check for generality."

in contrast to "my focus is on items and structures, and I start from  
the assumption that these can and do vary across speakers."

Many cognitive psychologists / cognitive scientists (all the ones I  
know at MIT for example) are interested in both cognitive  
generalizations across people and ways in which people differ  
cognitively.  In fact, some methods (e.g., the individual differences  
approach where co-variation of various behaviors / characteristics is  
examined across individuals) have been specifically developed to study  
differences among individuals.  Both kinds of data are important for  
understanding human cognition, including language.  This applies to  
language research directly: generalizations across people are  
important, but so are individual differences.  In either case,  
quantitative data are necessary to evaluate research questions and  
test hypotheses.

On a related note, it is a mistake to characterize researchers with a  
background in "psychology" or cognitive science as being interested in  
"processing", and researchers with a background in "linguistics" as  
being interested in "knowledge" or "representation / structure".  Both  
psychologists and linguists should be interested in *both*  
representation and processing (and learning, for that matter).  We  
wrote a little about this confusion in Gibson & Fedorenko (in press),  
which we include at the end of the message.

This leads to something that Dan said:

3. Dan says: "linguistics is not simply a subdiscipline of psychology"

Both linguistics and psychology are big fields.  We assume Dan is  
referring to cognitive psychology / cognitive science here.  (Of  
course, there are sub-fields of psychology  - e.g., personality  
psychology or abnormal psychology - which are somewhat distinct from  
linguistics, but those sub-fields are also distinct from cognitive  
psychology.)  It is true that historically linguistics is not treated  
as a subfield of cognitive psychology / cognitive science.  However,  
key research questions in linguistics (i.e., the form of the knowledge  
structures and algorithms underlying human language) are indeed a  
subset of those investigated by cognitive psychologists / cognitive  
scientists.  We think that the biggest factor separating linguistics  
from psychology is the methods used to explore the research questions,  
rather than the research questions themselves.   Consequently, we  
would like to continue to see tighter connections among the fields of  
psychology / cognitive science, linguistics, as well as other fields  
like anthropology and computer science.

Thanks to all for the interesting discussions.

Ted & Ev

We have encountered a claim that the reason for different kinds of  
methods being used across the different fields of language study  
(i.e., in linguistics vs. psycho-/neuro-linguistics) is that the  
research questions are different across these fields, and some methods  
may be better suited to ask some questions than others.  Although the  
latter is likely true, the premise – that the research questions are  
different across the fields – is false.  The typical claim is that  
researchers in the field of linguistics are investigating linguistic  
representations, and researchers in the fields of psycho-/neuro- 
linguistics are investigating the computations that take place as  
language is understood or produced.  However, many researchers in the  
fields of psycho-/neuro-linguistics are also interested in the nature  
of the linguistic representations (at all levels; e.g., phonological  
representations, lexical representations, syntactic representations,  
etc.) [1].  By the same token, many researchers in the field of  
linguistics are interested in the computations that take place in the  
course of online comprehension or production.  However, inferences –  
drawn from any dependent measure – about either the linguistic  
representations or computations are always indirect.  And these  
inferences are no more indirect in reading times or event-related  
potentials, etc., than in acceptability judgments: across all  
dependent measures we take some observable (e.g., a participant’s  
rating on an acceptability judgment task or the time it took a  
participant to read a sentence) and we try to infer something about  
the underlying cognitive representations / processes.  More generally,  
methods in cognitive science are often used to jointly learn about  
representations and computations, because inferences about  
representations can inform questions about the computations, and vice  
versa.  For example, certain data structures can make a computation  
more or less difficult to perform, and certain representations may  
require assumptions about the algorithms being used.

In our opinion then, the distinction between the fields of linguistics  
and psycho-/neuro-linguistics is purely along the lines of the kind of  
data that are used as evidence for or against theoretical hypotheses:  
typically non-quantitative data in linguistics vs. typically  
quantitative data in psycho-/neuro-linguistics.  Given the superficial  
nature of this distinction, we think that there should be one field of  
language study where a wide range of dependent measures is used to  
investigate linguistic representations and computations.

[1] In fact, some methods in cognitive science and cognitive  
neuroscience were specifically developed to get at representational  
questions (e.g., lexical / syntactic priming methods, neural  
adaptation or multi-voxel pattern analyses in functional MRI).

On Sep 10, 2010, at 9:05 PM, Daniel Everett wrote:

> I think that Brian and Dick make excellent points. There are very  
> good grammars written that could be improved by psycholinguistic  
> experimentation and more quantitative approaches. But large sections  
> of those grammars aren't going to change one bit (go-went) with  
> quantitative tests and such tests would be completely  
> counterproductive given the shortness of life and the vastness of  
> the field linguist's tasks.
>
> Part of the problem is that linguistics is not simply a  
> subdiscipline of psychology. Linguistics has its own objectives and  
> those only occasionally overlap with psychology. The same for methods.
>
> On another note, I don't buy the 'in my head' 'out of my head'  
> distinction either (that Matt seems to be urging upon us). We study  
> different things and have different reasons for being satisfied with  
> the results we achieve.
>
> I believe that  we linguists are often complacent and fail to apply  
> better methods. But of course that applies to all disciplines.
>
> In the meantime, checking corpora, collecting data as a result of  
> careful interviews with native speakers, and the other aspects of  
> the field linguist's task are vital parts of the linguist's task and  
> much of this won't be improved by quantitative methods as we  
> currently understand them. Maybe sometime.
>
> Dan
>
> P.S. In my original reference to Ted and Ev's paper, I said that  
> they showed the danger of using intuitions. What I meant to say of  
> using intuitions as standardly used by linguists. They convinced me  
> that there is a lot to learn from quantitative methods.
>
> On 10 Sep 2010, at 19:40, Richard Hudson wrote:
>
>> Dear Ted and Ev,
>> Yes, I understand your view, but I think it's a psycholinguist's  
>> view. Your goal is to find general processes and principles that  
>> apply uniformly across individuals, so you have to use methods to  
>> check for generality. And (as you know) I admire the way you pursue  
>> that goal. But my goal, as a linguist, is different. I want to  
>> explore the structure of a language so that I can understand how  
>> all the bits fit together. Like you, I'm aiming to model cognition,  
>> but my focus is on items and structures, and I start from the  
>> assumption that these can and do vary across speakers.
>>
>> However, having said all that I do agree with you that linguists  
>> should all get used to collecting and using quantitative data; and  
>> with the help of Brian MacWhinney's typology we'd know what methods  
>> to use when. And I do agree with your points about bid/bidded: in  
>> cases like that, quantitative data would be at least a very good  
>> starting point for a proper investigation.
>>
>> Best wishes, Dick
>>
>> Richard Hudson www.phon.ucl.ac.uk/home/dick/home.htm
>>
>> On 10/09/2010 19:30, Ted Gibson wrote:
>>> Dear Dick:
>>>
>>> Perhaps we are talking at cross purposes. I don't understand what  
>>> is confusing about what Ev Fedorenko and I are claiming. All we  
>>> are saying is that if you have some testable claim involving a  
>>> general hypothesis about a language, then you need to get  
>>> quantitative data from unbiased sources to evaluate that claim. If  
>>> you are interested in English past tense morphology, then  
>>> depending on the claims that you might want to investigate, there  
>>> are lots of ways to get relevant quantitative evidence. Corpus  
>>> data will probably be useful. For very low frequency words, you  
>>> can run experiments to test behavior with respect to such words.
>>>
>>> Your example of the past tense of "bid" is a fine such example.  
>>> You can run an experiment like the one you suggested to find out  
>>> what people think the past tense is. If you then found that 20/50  
>>> people responded "bidded" and 30/50 respond "bid", that is a lot  
>>> of useful information. As you suggest in your discussion, this  
>>> result wouldn't answer the question of how past tense is stored in  
>>> each individual. This result would be ambiguous among several  
>>> possible explanations. One possibility is that the probability  
>>> distribution that is being discovered reflects different dialects,  
>>> such that 2/5 of the population has one past tense, and 3/5 has  
>>> another. Another possibility is that each person has a similar  
>>> probability distribution in their heads, such that 2/5 of the time  
>>> I respond one way, and 3/5 of the time I respond another. Further  
>>> experiments would be necessary to answer between these and other  
>>> possible theories (e.g., with repeated trials from the same  
>>> person, carefully planned so that the participants don't notice  
>>> that they are being asked multiple times). Without the  
>>> quantitative evidence in the first place, there is no way to  
>>> answer these kinds of questions.
>>>
>>> Regarding the past tense of "go", this would be useful as a  
>>> baseline in an experiment involving the less frequent ones. So,  
>>> yes, it would useful to gather quantitative evidence in such a  
>>> case also, as baselines with respect to the more interesting cases  
>>> for theories.
>>>
>>> The bottom line: if you have a generalization about a language  
>>> that you wish to evaluate (such that you hypothesize that it is  
>>> true across the speakers of the language), then you need  
>>> quantitative evidence from multiple individuals, using an unbiased  
>>> data collection method, to evaluate such a claim. The point about  
>>> Mechanical Turk is that it is really *easy* to do this now, at  
>>> least for languages like English.
>>>
>>> Best wishes,
>>>
>>> Ted Gibson & Ev Fedorenko
>>>
>>> On Sep 10, 2010, at 1:59 PM, Richard Hudson wrote:
>>>
>>>> Dear Ted,
>>>> Thanks for the very interesting comment, but are you REALLY  
>>>> saying that I shouldn't claim, for example, that the past tense  
>>>> of GO is "went" without first cross-checking with 50 native  
>>>> speakers?
>>>>
>>>> Isn't there a danger of missing the point that we all, as native  
>>>> speakers, spend our whole lives scanning other people's  
>>>> linguistic behaviour (language 'out there', E-language) and  
>>>> trying to explain it to ourselves in terms of a language system  
>>>> (language 'in here', I-language)? So every judgement we make is  
>>>> based on thousands or millions of observed exemplars, and  
>>>> reflects a unique experience of E-language filtered through a  
>>>> unique I-language.
>>>>
>>>> Given that view of language development, I don't see how  
>>>> quantitative data will help. Let's take a real uncertainty, such  
>>>> as the past tense of BID. If I want to say I did it, do I say "I  
>>>> bidded" or "I bid"? My judgement: I don't know. Ok, you get 50  
>>>> people to oblige on Mechanical Turk, and 20 of them give "bidded"  
>>>> and 30 "bid". So what? Does that mean that the correct answer is  
>>>> "bidded"? Surely not. How is it better than my judgement? I agree  
>>>> you could record my speech and find how often I use each  
>>>> alternative; but the reason I don't know is precisely because  
>>>> it's a rare word, so in a sense quantitative data are irrelevant  
>>>> even there. What would solve the problem of subjectivity, of  
>>>> course, would be a machine for probing the bit of my mind (or  
>>>> even brain) that holds BID and its details; but I suspect that  
>>>> even that wouldn't move us much further forward than my original  
>>>> "don't know". (Incidentally I write as a fan of quantitative  
>>>> sociolinguistics, so I do accept that quantitative data are  
>>>> relevant to linguistic analysis in some areas, where the I- 
>>>> language phenomenon is frequent enough to produce usable data.)
>>>>
>>>> It seems to me that this discussion raises the really fundamental  
>>>> question of what kind of thing we think language is: social or  
>>>> individual. The problem isn't unique to linguistics of course;  
>>>> it's the same throughout the social sciences. But what's special  
>>>> about linguistics is that we deal in very fine details of culture  
>>>> (e.g. details of how a particular word is used or pronounced) so  
>>>> the differences between individuals really matter. I don't see  
>>>> that we're ever going to have anything better than judgements to  
>>>> go on, so what we need is a way to ensure that judgements are  
>>>> accurate reports of individual I-language. A rotten situation for  
>>>> a science, but I don't see how it can get better.
>>>>
>>>> Dick
>>>>
>>>> Richard Hudson www.phon.ucl.ac.uk/home/dick/home.htm
>>>>
>>>> On 10/09/2010 14:03, Ted Gibson wrote:
>>>>> Dear Dan, Dick:
>>>>>
>>>>> I would like to clarify some points that Dan Everett makes, in
>>>>> response to Dick Hudson.
>>>>>
>>>>> Ev Fedorenko and I have written a couple of papers recently  
>>>>> (Gibson &
>>>>> Fedorenko, 2010, in press, see references and links below) on  
>>>>> what we
>>>>> think are weak methodological standards in syntax and semantics
>>>>> research over the past many years. The issue that we address is  
>>>>> the
>>>>> prevalent method in syntax and semantics research, which involves
>>>>> obtaining a judgment of the acceptability of a sentence / meaning
>>>>> pair, typically by just the author of the paper, sometimes with
>>>>> feedback from colleagues. As we address in our papers, this
>>>>> methodology does not allow proper testing of scientific hypotheses
>>>>> because of (a) the small number of experimental participants
>>>>> (typically one); (b) the small number of experimental stimuli
>>>>> (typically one); (c) cognitive biases on the part of the  
>>>>> researcher
>>>>> and participants; and (d) the effect of the preceding context  
>>>>> (e.g.,
>>>>> other constructions the researcher may have been recently
>>>>> considering). (As Dan said, see Schutze, 1996; Cowart, 1997; and
>>>>> several others cited in Gibson & Fedorenko, in press; for similar
>>>>> points, but with not as strong a conclusion as ours).
>>>>>
>>>>> Three issues need to be separated here: (1) the use of intuitive
>>>>> judgments as a dependent measure in a language experiment; (2)
>>>>> potential cognitive biases on the part of experimental subjects  
>>>>> and
>>>>> experimenters in language experiments; and (3) the need for  
>>>>> obtaining
>>>>> quantitative evidence, whatever the dependent measure might be.  
>>>>> The
>>>>> paper that Ev and I wrote addresses the last two issues, but  
>>>>> does not
>>>>> go into depth on the first issue (the use of intuitions as a  
>>>>> dependent
>>>>> measure in language experiments). Regarding this issue, we don't  
>>>>> think
>>>>> that there is anything wrong with gathering intuitive judgments  
>>>>> as a
>>>>> dependent measure, as long as the task is clear to the  
>>>>> experimental
>>>>> participants.
>>>>>
>>>>> In the longer paper (Gibson & Fedorenko, in press) we respond to  
>>>>> some
>>>>> arguments that have been given in support of continuing to use the
>>>>> traditional non-quantitative method in syntax / semantics  
>>>>> research.
>>>>> One recent defense of the traditional method comes from Phillips
>>>>> (2008), who argues that no harm has come from the non-quantitative
>>>>> approach in syntax research thus far. Phillips argues that there  
>>>>> are
>>>>> no cases in the literature where an incorrect intuitive judgment  
>>>>> has
>>>>> become the basis for a widely accepted generalization or an  
>>>>> important
>>>>> theoretical claim. He therefore concludes that there is no  
>>>>> reason to
>>>>> adopt more rigorous data collection standards. We challenge  
>>>>> Philips’
>>>>> conclusion by presenting three cases from the literature where a
>>>>> faulty intuition has led to incorrect generalizations and mistaken
>>>>> theorizing, plausibly due to cognitive biases on the part of the
>>>>> researchers.
>>>>>
>>>>> A second argument that is sometimes presented for the continued  
>>>>> use of
>>>>> the traditional non-quantitative method is that it would be too
>>>>> inefficient to evaluate every syntactic / semantic hypothesis or
>>>>> phenomenon quantitatively. For example, Culicover & Jackendoff  
>>>>> (2010)
>>>>> make this argument explicitly in their response to Gibson &  
>>>>> Fedorenko
>>>>> (2010): “It would cripple linguistic investigation if it were  
>>>>> required
>>>>> that all judgments of ambiguity and grammaticality be subject to
>>>>> statistically rigorous experiments on naive subjects, especially  
>>>>> when
>>>>> investigating languages whose speakers are hard to  
>>>>> access” (Culicover
>>>>> & Jackendoff, 2010, p. 234). (Dick Hudson makes a similar point
>>>>> earlier in the discussion here.) Whereas we agree that in
>>>>> circumstances where gathering data is difficult, some evidence is
>>>>> better than no evidence, we do not agree that research would be  
>>>>> slowed
>>>>> with respect to languages where experimental participants are  
>>>>> easy to
>>>>> access, such as English. In contrast, we think that the opposite  
>>>>> is
>>>>> true: the field’s progress is probably slowed by not doing
>>>>> quantitative research.
>>>>> Suppose that a typical syntax / semantics paper that lacks
>>>>> quantitative evidence includes judgments for 50 or more  
>>>>> sentences /
>>>>> meaning pairs, corresponding to 50 or more empirical claims.  
>>>>> Even if
>>>>> most of the judgments from such a paper are correct or are on the
>>>>> right track, the problem is in knowing which judgments are  
>>>>> correct.
>>>>> For example, suppose that 90% of the judgments from an arbitrary  
>>>>> paper
>>>>> are correct (which is probably a high estimate). (Colin Phillips  
>>>>> and
>>>>> some of his former students / postdocs have commented to us  
>>>>> that, in
>>>>> their experience, quantitative acceptability judgment studies  
>>>>> almost
>>>>> always validate the claim(s) in the literature. This is not our
>>>>> experience, however. Most experiments that we have run which  
>>>>> attempt
>>>>> to test some syntactic / semantic hypothesis in the literature  
>>>>> end up
>>>>> providing us with a pattern of data that had not been known  
>>>>> before the
>>>>> experiment (e.g., Breen et al., in press; Fedorenko & Gibson, in
>>>>> press; Patel et al., 2009; Scontras & Gibson, submitted).) This  
>>>>> means
>>>>> that in a paper with 50 empirical claims 45/50 are correct. But  
>>>>> which
>>>>> 45? There are 2,118, 760 ways to choose 45 items from 50. That’s  
>>>>> over
>>>>> two million different theories. By quantitatively evaluating the
>>>>> empirical claims, we reduce the uncertainty a great deal. To make
>>>>> progress, it is better to have theoretical claims supported by  
>>>>> solid
>>>>> quantitative evidence, so that even if the interpretation of the  
>>>>> data
>>>>> changes over time as new evidence becomes available – as is  
>>>>> often the
>>>>> case in any field of science – the empirical pattern can be used  
>>>>> as a
>>>>> basis for further theorizing.
>>>>>
>>>>> Furthermore, it is no longer expensive to run behavioral  
>>>>> experiments,
>>>>> at least in English and other widely spoken languages. There now
>>>>> exists a marketplace interface – Amazon.com’s Mechanical Turk –  
>>>>> which
>>>>> can be used for collecting behavioral data over the internet  
>>>>> quickly
>>>>> and inexpensively. The cost of using an interface like this is
>>>>> minimal, and the time that it takes for the results to be  
>>>>> returned is
>>>>> short. For example, currently on Mechanical Turk, a survey of
>>>>> approximately 50 items will be answered by 50 or more participants
>>>>> within a couple of hours, at a cost of approximately $1 per
>>>>> participant. Thus a survey can be completed within a day, at a  
>>>>> cost of
>>>>> less than $50. (The hard work of designing the experiment, and
>>>>> constructing controlled materials remains of course.)
>>>>>
>>>>> Sorry to be so verbose. But I think that these methodological  
>>>>> points
>>>>> are very important.
>>>>>
>>>>> Best wishes,
>>>>>
>>>>> Ted Gibson
>>>>>
>>>>> Gibson, E. & Fedorenko, E. (In press). The need for quantitative
>>>>> methods in syntax and semantics research. Language and Cognitive
>>>>> Processes. http://tedlab.mit.edu/tedlab_website/researchpapers/Gibson
>>>>> & Fedorenko InPress LCP.pdf
>>>>>
>>>>> Gibson, E. & Fedorenko, E. (2010). Weak quantitative standards in
>>>>> linguistics research. Trends in Cognitive Science, 14, 233-234.
>>>>> http://tedlab.mit.edu/tedlab_website/researchpapers/Gibson &  
>>>>> Fedorenko
>>>>> 2010 TICS.pdf
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> Dick,
>>>>>>
>>>>>> You raise an important issue here about methodology. I believe  
>>>>>> that
>>>>>> intuitions are a fine way to generate hypotheses and even to test
>>>>>> them - to a degree. But while it might not have been feasible for
>>>>>> Huddleston, Pullum, and the other contributors to the Cambridge
>>>>>> Grammar to conduct experiments on every point of the grammar,
>>>>>> experiments could have only made the grammar better. The use of
>>>>>> intuitions, corpora, and standard psycholinguistic  
>>>>>> experimentation
>>>>>> (indeed, Standard Social Science Methodology) is vital for  
>>>>>> taking the
>>>>>> field forward and for providing the best support for different
>>>>>> analyses. Ted Gibson and Ev Fedorenko have written a very  
>>>>>> useful new
>>>>>> paper on this, showing serious shortcomings with intuitions as  
>>>>>> the
>>>>>> sole source of evidence, in their paper: "The need for  
>>>>>> quantitative
>>>>>> methods in syntax and semantics research".
>>>>>>
>>>>>> Carson Schutze and Wayne Cowart, among others, have also written
>>>>>> convincingly on this.
>>>>>>
>>>>>> It is one reason that a team from Stanford, MIT (Brain and  
>>>>>> Cognitive
>>>>>> Science), and researchers from Brazil are beginning a third  
>>>>>> round of
>>>>>> experimental work among the Pirahas, since my own work on the  
>>>>>> syntax
>>>>>> was, like almost every other field researcher's, based on native
>>>>>> speaker intuitions and corpora.
>>>>>>
>>>>>> The discussion of methodologies reminds me of the initial  
>>>>>> reactions
>>>>>> to Greenberg's work on classifying the languages of the  
>>>>>> Americas. His
>>>>>> methods were strongly (and justifiably) criticized. However, I  
>>>>>> always
>>>>>> thought that his methods were a great way of generating  
>>>>>> hypotheses,
>>>>>> so long as they were ultimately put to the test of standard
>>>>>> historical linguistics methods. And the same seems true for use  
>>>>>> of
>>>>>> native-speaker intuitions.
>>>>>>
>>>>>> -- Dan
>>>>>
>>>>>
>>>>>
>>>>>>> We linguists can add a further layer of explanation to the
>>>>>>> judgements, but some judgements do seem to be more reliable than
>>>>>>> others. And if we have to wait for psycholinguistic evidence for
>>>>>>> every detailed analysis we make, our whole discipline will
>>>>>>> immediately grind to a halt. Like it or not, native speaker
>>>>>>> judgements are what put us linguists ahead of the rest in  
>>>>>>> handling
>>>>>>> fine detail. Imagine writing the Cambridge Grammar of the  
>>>>>>> English
>>>>>>> Language (or the OED) without using native speaker judgements.
>>>>>>>
>>>>>>> Best wishes, Dick Hudson
>>>>>
>>>>>
>>>>>
>>>
>>>
>>>
>>
>