[Corpora-List] Corpora of Learner English and Learner German

Eric Atwell eric at comp.leeds.ac.uk
Wed May 16 08:03:22 UTC 2007


Barbara,

When looking for suitable corpora, try ELDA http://www.elda.org/
- a search of the catalogue for "learner English Corpus" finds:

ISLE Speech Corpus

Approx. 20 minutes of speech (per speaker) from 23 German and 23 Italian
intermediate learners of English. Each speaker recorded sentences from
several blocks of differing types (reading simple sentences, using
minimal pairs, giving answers to multiple choice questions). The prompts
were of varying perplexities.
About 2/3 of the data for each speaker was annotated by one of a team of
linguists. The files were corrected first at the word level, and an
automatic recognizer was then used to produce phone-level annotations.
The annotator then re-annotated each sentence to mark phone and stress
errors (e.g., substitutions, insertions, or deletions). Corpus details:
46 speakers (23 German and 23 Italian);  11484 utterances; 1.92 gigabytes 
of WAV files (4 CDs); 17 hours, 54 minutes, and 44 seconds of speech data. 
For more details, see:

Menzel, W; Atwell, E; Bonaventura, P; Herron, D; Howarth, P; Morton, R;
Souter, C. The ISLE Corpus of non-native spoken English. in Proc LREC2000 
vol. 2, pp. 957-964, European Language Resources Association. 
2000. http://www.comp.leeds.ac.uk/eric/menzel00lrec.pdf

Atwell, Eric; Howarth, Peter; Souter, Clive. The ISLE corpus: Italian
and German spoken learner's English. ICAME Journal, vol. 27, pp. 5-18.
2003. http://www.comp.leeds.ac.uk/eric/atwell03icamej.pdf


I hope this helps...


Eric Atwell,

Senior Lecturer, Language research group leader, School of Computing 
Faculty of Engineering, UNIVERSITY OF LEEDS, Leeds LS2 9JT, England
TEL: 0113-3435430  FAX: 0113-3435468  WWW/email: google Eric Atwell


On Wed, 16 May 2007, Barbara Schiftner wrote:

>
> Dear all,
>
>
> I am a student at the department of English at the University of Vienna. In 
> my diploma thesis, I am investigating the development in learner corpus 
> research, focusing in particular on corpora of learner English and learner 
> German.
>
> An integral part of my paper will be an analysis of the status quo, which 
> should incorporate a representative sample of available corpora of learner 
> English and learner German. Therefore, I would be grateful for any up-to-date 
> information about the corpora listed below, or suggestions for other learner 
> corpora that should not be left out in my discussion.
>
>
> Thank you for your help!
>
>
>
> Best regards,
>
> Barbara Schiftner
>
>
>
> This is a list of the learner corpora I have found out about so far:
>
>
>
> Corpora of Learner English
>
>
> CLC (Cambridge Learner Corpus)
>
> CLEC (Chinese Learner English Corpus)
>
> HKUST (Hong Kong University of Science and Technology)
>
> ICLE (International Corpus of Learner English)
>
> JEFLL (Japanese EFL Learner)
>
> JPU (Janus Pannonius University Corpus)
>
> LLC (Longman Learners? Corpus)
>
> MELD (Montclair Electronic Language Database)
>
> Polish Learner English Corpus
>
> SILS (School of International Liberal Studies at Waseda University)
>
> TeleNex Student Corpus
>
> USE (Uppsala Student English Project)
>
>
> Corpora of Learner German
>
>
> FALKO (fehlerannotiertes Lernerkorpus des Deutschen als Fremdsprache, HU 
> Berlin)
>
> LeKo (Lernerkorpus, HU Berlin)
>
> Telecorp (Pennsylvania)
>
> Corpus collected by Ursula Weinberger (Lancaster)
>
>
>
> (My main focus is on written texts, but remarks about corpora of spoken 
> learner language are also welcome.)
>
>
> ______________________________
> Barbara Schiftner
>
> Fachdidaktisches Zentrum
> Institut fuer Anglistik und Amerikanistik
> Universitaet Wien
> Spitalgasse 2-4, Hof 8
> A-1090 Wien
> Austria
>
> phone: +43-1-4277-424-53
> e-mail: barbara.schiftner at univie.ac.at
>
>



More information about the Corpora mailing list