Corpora: overuse an underuse of learner English; International English

P. Kaszubski przemka at amu.edu.pl
Sat Dec 15 22:19:47 UTC 2001


On 13 Dec 2001, at 11:18, Tadeusz Piotrowski wrote:

> By accident, I am a Polish user of English (now I am writing
> self-consciously, thinking about my own cluster of errors...), and by
> accident I know an interesting PhD dissertation that compares selected
> aspects of natives-speaker English to those of a non-native variety,
> comparing like with like: Przemyslaw Kaszubski Selected aspects of
> lexicon, phraseology and style in the writing of Polish advanced
> learners of English, a contrastive, corpus-based approach. Poznan 2000
> (przemka at elex.amu.edu.pl). He hoped to publish it.

Great many thanks are due to my reviewer Prof. Piotrowski for
publicising my modest effort on this forum. The original version of
the PhD can be downloaded from:

http://main.amu.edu.pl/~przemka/rsearch.html

The files are in pdf, and some of them are password-protected.
Researchers willing to consult these files should e-mail me first.

My own opinion on the issue of whether applied linguists should or
should not measure interlanguage corpora against control native-
speaker data parallels what Simon G. J. Smith wrote: "in general,
surely, the native speaker variety of a language is in some sense
the correct one, and thereby automatically has a different status
from that of other varieties. Otherwise what yardstick, in the
descriptive tradition, do we have for judging what is well-formed
and what is not?" Regardless of the changing status of English as a
lingua franca (cf. Kachru's 'expanding circle'), EFL learners often do
demand to be told what is 'correct'. While often there may not be
one satisfying answer to such demands, what teachers et al. can do
is attempt to give a multiple probabilistic answer, such as that in
genre X an educated American would probably write Y, while his
British counterpart might also consider Z etc etc. In cross-corpus
analysis involving learner corpora (which in the end serves to
illuminate language instruction) it is absolutely vital to be aware of
what sort of corpus/corpora we offer as control data, to be sensitive
to sociological and textual variables *especially* when we cannot
give an exact match for the experimental data. The ability to derive
tentative conclusions (whether it's overuse, underuse or misuse that
we've detected) from such analyses is likewise a precious faculty.
Let alone allowing certain statistical manipulation, one way of
ensuring that we're getting more or less accurate findings about a
given IL is to compare many different corpora of as like textual
nature (same genre, at the very least) as it is possible - featuring
such dichotomies/continua as: native-vs non-native, adult vs.
adolescent vs child performance (for both L1 and L2 concerned),
advanced vs. intermediate vs. beginner learners etc,
interlanguage1 (IL1) vs IL2 vs IL3 (i.e. from different mother
backgrounds) etc, etc. It is my belief that only statistical
comparisons of such complex kind can produce fairly reliable and
pedagogically USEFUL data we can confidently tell our students
about.

Would I want to teach my students International English? Sure, but I
would also like to know how it relates to British/American standards
and naturalness of expression (for genre A, B etc). Barbara
Seidlhofer's project is an important step towards capturing this
relation.




=======================================
Dr Przemyslaw Kaszubski
t: +48 61 8293515
e: przemka at amu.edu.pl
w: http://elex.amu.edu.pl/ifa/staff/kaszubski.html

MY (ENGLISH) LEARNER CORPORA PAGE:
http://main.amu.edu.pl/~przemka

School of English
Adam Mickiewicz University
Al. Niepodleglosci 4
61-874 Poznan
t: +48 61 8293506
f: +48 61 8523103
w: http://elex.amu.edu.pl/ifa
=======================================



More information about the Corpora mailing list