Corpora: overuse and underuse of learner English

Robert Bley-Vroman vroman at hawaii.edu
Tue Dec 11 23:44:21 UTC 2001


At 8:28 AM -1000 12/11/01, xiaotian guo wrote:

>It is unavoidable to touch overuse and
>underuse in the study of corpora comparison. But to what extend does the
>difference of a certain figure reach when we can say overuse or underuse
>occurs (I am poor in statistics)?

The obvious simple thing is to develop some measurement of rate-of-use.
Normally, this would be a proportion (e.g. 20% of the verbs are present
tense in native-speaker corpus whereas 40% a present tense in learner
corpora). A simple statistic you could calculate would be a confidence
interval for the proportion (easy to do by hand even for someone who is
poor in statistics). Report the proportion and the confidence interval. If
the confidence intervals for the two proportions overlap, it wouldn't be
wise to claim overuse or underuse.  (You could do much fancier things,
statistically, but I'd advocate this as a start; it has an obvious
intuitive interpretation and it's easy to calculate.) Whether you really
think that the overuse is "a lot more", or "more to an important extent"
depends on ones judgement and interpretation and on the relationship of
this finding to research hypotheses and theoretical rationale.

The way to avoid the "so-what syndrome" is to have a clear theoretical
rationale for your research hypotheses. In fact, even the question
appropriate statistical techniques is hard to answer at more than a very
basic level without a theoretically grounded research question.

For example, it has been proposed (e.g. by J. Schachter 1974) that
native-speakers of Chinese will underuse relative clauses in English. Her
study, which tended to confirm her predictions, was based on her concept of
"a priori contrastive analysis" (that is, it relied on a linguistic
comparison of relative clause formation in Chinese and English plus a
theory of interlanguage identifiability and some concept of the conditions
which would give rise to underuse.)  In contrast, it might be that Chinese
learners of English underproduce relative clauses in English because the
rate of relative clause use in Chinese itself if lower than the rate of
relative clause use in English. In order to test this idea, you'd need to
make corpus comparisons relative clause of native English and native
Chinese as well as of Chinese learners of English.

Robert Bley-Vroman

--
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Robert Bley-Vroman
Department Chair              MA Program in ESL and
Second Language Studies       PhD Program in Second Language Acquisition
University of Hawai'i         Graduate Faculty of Linguistics
1890 East-West Road           Associate Director for Technology
Honolulu HI 96822             National Foreign Language Resource Center
(808)956-2800; fax: (808)956-2802
mailto:vroman at hawaii.edu      http://www.sls.hawaii.edu/bley-vroman/



More information about the Corpora mailing list