Corpora

David B. Kronenfeld kfeld at CITRUS.UCR.EDU
Thu Dec 16 20:48:08 UTC 1999


Out of curiosity, how does the corpora vs. invented speech distinction
relate to the old distinction between the way people normally talk vs. the
way the talk which you ask them to slow down and be careful (or ask them
what they meant to say--or some such)--particularly re how grammatical
relations are approached ?

And-- "amen" on the Z. Harris comment and what follows it.
                        David Kronenfeld

At 10:19 AM 12/16/99 -0800, you wrote:
<stuff deleted>
>More to the point might be a discussion of what corpus research involves,
>what kinds of corpora there are, how they can best be exploited, etc.
>Brian and others are providing an important service in this regard with
>the Talkbank project.
>
>As I see it, we're talking about an alternative to the popular kinds of
>data-gathering that have involved inventing sentences in isolation and
>asking people whether they "get" them, or measuring the reaction times of
>college sophomores who see them written on computer screens.  The
>alternative is to examine how people actually use language, a process that
>necessarily involves confronting more than single sentences.  In that
>sense it's part of what has come to be called the study of "discourse",
>which of course can be conducted in many different ways.  It's worth
>noting that some people have been examining such data for a long time:
>one thinks, for example, of many language acquisition studies, of the
>analysis of conversation (from various points of view), and of the
>recording and analysis of "texts" collected by those who have been
>studying lesser known languages.  This last kind of corpus work has been
>going on for well over a hundred years.
>
>It's worth noting, too, that this distinction between constructed language
>and what I like to call "real" language ("natural language" has been
>coopted with a different meaning) is orthogonal to the
>formalist-functionalist dichotomy, at least in the sense that while many
>functionalists do work with corpora, many do not.
>
>It might be worth discussing the problems that arise from the supposedly
>accidental nature of corpora, and the lack of the control and
>replicability that are so dear to the hearts of psychologists.  One might
>actually find some significance, for example, in the fact that people
>rarely use a construction one might think easy to invent.  And of course
>the problem tends to diminish with very large corpora.
>
>But very large corpora may introduce a problem of their own.  Some of you
>may remember Zellig Harris's book Methods in Structural Linguistics, where
>he suggested we could get around the vexing problem of meaning by
>examining the distribution of linguistic forms in very large corpora.
>Machines to do that weren't available at the time (1950), but now they
>are, and it looks to me as if some people are doing what Harris had in
>mind, though so far as I know they don't refer to him.  It makes me
>uncomfortable because I think it's more rewarding in the long run to
>confront semantics head-on, not trying to avoid it with big corpora and
>machines.
>
>Just one last reservation.  Corpora make it easy to count things and come
>up with interesting findings regarding the frequency of this or that.  But
>knowing exactly what you're counting may not be such a simple matter, and
>it's easy to come up with "operational definitions" that turn out in the
>end to be spurious.  What I'm trying to say is that there's much of
>importance to learn from examining real language, but it shouldn't seduce
>us into thinking we can just crank out analyses mechanically.
>Understanding the nature of language is always going to require the
>intervention of perceptive human minds.
>
>Wally Chafe
>
David B. Kronenfeld             Phone   Office  909/787-4340
Department of Anthropology              Message 909/787-5524
University of California                Fax     909/787-5409
Riverside, CA 92521             email   kfeld at citrus.ucr.edu

http://www.ucr.edu/CHSS/depts/anthro/home.htm
http://pweb.netcom.com/~fanti/david.html



More information about the Funknet mailing list