Corpora: overuse and underuse of learner English

Patrick Gillard pgillard at cambridge.org
Wed Dec 12 11:06:34 UTC 2001


At 01:44 PM 12/11/01 -1000, Robert Bley-Vroman wrote:
>At 8:28 AM -1000 12/11/01, xiaotian guo wrote:
>
>>It is unavoidable to touch overuse and
>>underuse in the study of corpora comparison. But to what extend does the
>>difference of a certain figure reach when we can say overuse or underuse
>>occurs (I am poor in statistics)?
>
>The obvious simple thing is to develop some measurement of rate-of-use.
>Normally, this would be a proportion (e.g. 20% of the verbs are present
>tense in native-speaker corpus whereas 40% a present tense in learner
>corpora). A simple statistic you could calculate would be a confidence
>interval for the proportion (easy to do by hand even for someone who is
>poor in statistics). Report the proportion and the confidence interval. If
>the confidence intervals for the two proportions overlap, it wouldn't be
>wise to claim overuse or underuse.

Can I add a further caution. If one attempts to draw conclusions from
comparisons of Native Speaker corpus and Non-native speaker corpus, it is
important to make sure that you are comparing like with like.

When learners are given writing tasks they are sometimes asked to produce
types of texts that don't occur very frequently in Native-speaker English.
For example, if a student is asked to describe their daily routine they
will produce a lot of present simple structures but in native speaker
English writing you are not very likely to find a text like that. In fact,
if you use a native speaker corpus that has a large amount of newspaper
data in it, you may find that the simple past is *over-represented* in your
corpus compared to native speaker English of other types, because what
newspapers are mostly concerned with is what happened *yesterday*.

By the way, I do think that rate-of-use studies are very useful in order to
analyse learner English. You just have to be careful that you go into it
with your eyes open.


Patrick Gillard
Senior Commissioning Editor
ELT Dictionaries
Cambridge University Press

pgillard at cambridge.org

http://www.cambridge.org/elt

Direct line: +44 (0)1223 325596

Cambridge Learner's Dictionary (published February 2001)
http://www.cambridge.org/elt/cld



More information about the Corpora mailing list