[Corpora-List] L2 Learner Corpora

Xu Jiajin ustcxujj at gmail.com
Mon Dec 29 06:19:35 UTC 2008


It's quite easy to get a copy of the following learner corpora from a
bookstore in China.



1. CLEC (2003, Chinese Learners' English Corpus, one million words of
written English, consisting of five sub-corpora, high school, non-English
major college students (CET-4), non-English major college students (CET-6),
English major college students first and second year, English major college
students third and fourth year, 200 thousand words for each sub-corpus. This
corpus has been richly error-tagged.)



2. SWECCL 1.0 (2005, Spoken and Written English Corpus of Chinese College
Learners 1.0, one million words for the written sub-corpus, and one million
for the spoken sub-corpus. The corpus has been POS-tagged. The spoken
sub-corpus, SECCL1.0, is accompanied by three CDs of sound files.)



3. SWECCL 2.0 (2008, Spoken and Written English Corpus of Chinese College
Learners 2.0. The same sampling frame has been used, yet included completely
different learner data. The corpus is not annotated. The spoken sub-corpus,
SECCL2.0, is accompanied by two DVDs of sound files.)



4. COLSEC (2005, College Learner Spoken English Corpus, 600-700 thousand
words. The corpus has been pronunciation error-tagged.)



5. PACCL (2008, Parallel Corpus of Chinese EFL Learners, 2.1 millions words.
This is a learner translation corpus.)



6. CEM Corpus (2008, Corpus for English majors, 1 million words in the
current published version. The projected corpus size is five million words.)



What I've listed above are the publicly available ones, and I am aware that
some others are "under construction".



The price for the corpora is from 27-70 RMB, approximately 3-7 GB pounds (or
6-14 USD) per corpus.



These corpora have been the empirical foundation for hundreds of journal
articles and theses in China.



Jiajin Xu

Ph.D.

National Research Centre for Foreign Language Education

Beijing Foreign Studies University

xujiajin at bfsu.edu.cn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20081229/bad215b4/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list