[Corpora-List] L2 Learner Corpora

Mieke van der Velden mieke at club-internet.fr
Sun Jan 25 16:48:29 UTC 2009


Dear All,

A few weeks ago I requested for information on freely accessible learner corpora of english texts written by learners with different L1 backgrounds.
My thanks to the following for their replies:

- Ylva Berglund Prytz
- Carmela Chateau
- Anna Feldman
- Šarolta Godnic Vicic
- Xiaotian Guo
- Chau Meng Huat
- John Milton
- Stephen Reder
- Jiajin Xu

Most suggestion for online resources I received pointed to online concordancers, and some to downloadable data. Some people also proposed to share (part) of their personally collected corpora. My sincere thanks for their generosity.



Below are the links to the different sites that were suggested (comments are from the websites):
 
Chinese Learner English Corpus (free online concordancer)
http://www.clal.org.cn/corpus/EngSearchEngine.aspx
A written English corpus by Chinese students of different levels.
 
Cobb's Lextutor (free online concordancer)
http://www.lextutor.ca/concordancers/concord_e.html
Online concordancer for a number of learner corpora.

EVA Corpus (free online concordancer)
http://kh.hd.uib.no/eva/ 
English L2 corpus, written and spoken by Norwegian pupils.

The JPU Corpus (free online concordancer, free downloadable data)
http://joeandco.blogspot.com/ 
221 essays and research papers from Hungarian students' writing in English
Writing subcorpus at http://joeandco.blogspot.com/2008/06/writing-subcorpus.html
The Russian retraining corpus: http://joeandco.blogspot.com/2008/06/russian-retraining-subcorpus.html
The language practice subcorpus: http://joeandco.blogspot.com/2008/06/language-practice-subcorpus.html
Online concordance available at http://www.lextutor.ca/concordancers/concord_e.html

Learner Business Letters Corpus (free online concordancer)
http://www.someya-net.com/concordancer/
209,461 word tokens in 1,464 letters written by Japanese business people

Learner Corpora at the Language Bank (free online concordancer)
http://langbank.engl.polyu.edu.hk/indexl.html
The Learner Corpus of Essays and Reports contains samples of English writing produced by second language learners of English as part of their coursework requirements for 'English for Academic Purposes' at Hong Kong PolyU. The essays and project reports cover a range of topics from Science, IT and New Media to Nursing, Business and Economics, and the Social Sciences. This diversity reflects the background of the student authors, who major in different Higher Diploma and Degree subjects including Manufacturing Engineering, Nursing, Hotel and Tourism Management, Social Work, Interior Design and Fashion Design.
PolyU Language Bank Concordancer available at http://langbank.engl.polyu.edu.hk/engine.aspx?Submit=Search&lang=1&corpus=16

The Multimedia Adult English Learner Corpus (freely available for research purposes, request for access)
http://www.labschool.pdx.edu/maelc_access.html
The Multimedia Adult English Learner Corpus (MAELC) is a database of video of classroom interaction and associated written materials collected as part of the Lab School research project since 2001. At this time, the corpus includes materials from four years of classes from adult ESL classes from beginning to upper-intermediate proficiency - more than 3600 hours of classroom interaction recorded by six cameras and multiple microphones.

The Montclair Electronic Language Learners' Database (free downloadable data, free online concordancer)
http://www.chss.montclair.edu/linguistics/MELD/
The Montclair Electronic Language Learners' Database project collects, stores, and annotates text written by all levels of second language (L2) learners. The database is publicly available for research in L2 acquisition, L2 writing assessment, and L2 writing pedagogy. The database currently contains 44477 words of annotated text and another 53826 words of available but as yet unannotated text.

PICLE Corpus (free online concordancer)
http://ifa.amu.edu.pl/~ifaconc/main.php
PICLE Corpos of Polish EFL advanced-level argumentative essays (365 essays, about 330k words).
More information about the IFA Concordancer on http://ifa.amu.edu.pl/~ifaconc/

Singapore Corpus of Research in Education (online query, free registration) 
http://score.crpp.nie.edu.sg/score/index.htm
The corpus consists of teaching materials, classroom interactions, and student artifacts. The current release data are mainly transcripts of classroom audio/video recordings taped in more than 350 primary and secondary schools in Singapore.

The Uppsala Student English corpus (free downloadable data)
http://www.engelska.uu.se/use.html
The corpus consists of 1,489 essays written by 440 Swedish university students of English at three different levels, the majority in their first term of full-time studies. The total number of words is 1,221,265, which means an average essay length of 820 words. A typical first-term essay is somewhat shorter, averaging 777 words.
Dowloadable data at http://www.ota.ox.ac.uk/headers/2457.xml
The essays cover set topics of different types. They were written out of class, against a deadline of two to three weeks, length limitations imposed (usually 700-800 words), and suitable text structure suggested. First-term students were admitted for both spring (January 20 - June 6) and autumn terms (September 1 - January 19). 



Best regards,

Mieke
Université Nancy2

-- 
J’utilise la version gratuite de SPAMfighter pour utilisateurs privés.
Jusqu’à présent SPAMfighter a bloqué 6094 courriels spam.
Nous avons en ce moment 5.9 millions d’utilisateurs de par le monde entier. 
 Les utlisateurs payants n’ont pas ce message. Vous pouvez télécharger la version gratuite: http://www.spamfighter.com/lfr
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20090125/bc5d6902/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list