<meta http-equiv="Content-Type" content="text/html; charset=utf-8"><meta name="ProgId" content="Word.Document"><meta name="Generator" content="Microsoft Word 11"><meta name="Originator" content="Microsoft Word 11"><link rel="File-List" href="file:///C:%5CDOCUME%7E1%5CUser%5CLOCALS%7E1%5CTemp%5Cmsohtml1%5C01%5Cclip_filelist.xml"><style>
<!--
/* Font Definitions */
@font-face
{font-family:宋体;
panose-1:2 1 6 0 3 1 1 1 1 1;
mso-font-alt:SimSun;
mso-font-charset:134;
mso-generic-font-family:auto;
mso-font-pitch:variable;
mso-font-signature:3 135135232 16 0 262145 0;}
@font-face
{font-family:"\@宋体";
panose-1:2 1 6 0 3 1 1 1 1 1;
mso-font-charset:134;
mso-generic-font-family:auto;
mso-font-pitch:variable;
mso-font-signature:3 135135232 16 0 262145 0;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{mso-style-parent:"";
margin:0cm;
margin-bottom:.0001pt;
text-align:justify;
text-justify:inter-ideograph;
mso-pagination:none;
font-size:10.5pt;
mso-bidi-font-size:12.0pt;
font-family:"Times New Roman";
mso-fareast-font-family:宋体;
mso-font-kerning:1.0pt;
mso-ansi-language:EN-GB;}
/* Page Definitions */
@page
{mso-page-border-surround-header:no;
mso-page-border-surround-footer:no;}
@page Section1
{size:595.3pt 841.9pt;
margin:72.0pt 90.0pt 72.0pt 90.0pt;
mso-header-margin:42.55pt;
mso-footer-margin:49.6pt;
mso-paper-source:0;
layout-grid:15.6pt;}
div.Section1
{page:Section1;}
-->
</style>
<p class="MsoNormal"><span lang="EN-GB"> </span></p>
<p class="MsoNormal"><span lang="EN-GB">It's quite easy to get a copy of the
following learner corpora from a bookstore in China. </span></p>
<p class="MsoNormal"><span lang="EN-GB"> </span></p>
<p class="MsoNormal"><span lang="EN-GB">1. CLEC (2003, Chinese Learners' English
Corpus, one million words of written English, consisting of five sub-corpora,
high school, non-English major college students (CET-4), non-English major
college students (CET-6), English major college students first and second year,
English major college students third and fourth year, 200 thousand words for
each sub-corpus. This corpus has been richly error-tagged.)</span></p>
<p class="MsoNormal"><span lang="EN-GB"> </span></p>
<p class="MsoNormal"><span lang="EN-GB">2. SWECCL 1.0 (2005, Spoken and Written
English Corpus of Chinese College Learners 1.0, one million words for the
written sub-corpus, and one million for the spoken sub-corpus. The corpus has
been POS-tagged. The spoken sub-corpus, SECCL1.0, is accompanied by three CDs
of sound files.)</span></p>
<p class="MsoNormal"><span lang="EN-GB"> </span></p>
<p class="MsoNormal"><span lang="EN-GB">3. SWECCL 2.0 (2008, Spoken and Written
English Corpus of Chinese College Learners 2.0. The same sampling frame has
been used, yet included completely different learner data. The corpus is not
annotated. The spoken sub-corpus, SECCL2.0, is accompanied by two DVDs of sound
files.)</span></p>
<p class="MsoNormal"><span lang="EN-GB"> </span></p>
<p class="MsoNormal"><span lang="EN-GB">4. COLSEC (2005, College Learner Spoken
English Corpus, 600-700 thousand words. The corpus has been pronunciation
error-tagged.)</span></p>
<p class="MsoNormal"><span lang="EN-GB"> </span></p>
<p class="MsoNormal"><span lang="EN-GB">5. PACCL (2008, Parallel Corpus of Chinese EFL
Learners, 2.1 millions words. This is a learner translation corpus.)</span></p>
<p class="MsoNormal"><span lang="EN-GB"> </span></p>
<p class="MsoNormal"><span lang="EN-GB">6. CEM Corpus (2008, Corpus for English
majors, 1 million words in the current published version. The projected corpus
size is five million words.)</span></p>
<p class="MsoNormal"><span lang="EN-GB"> </span></p>
<p class="MsoNormal"><span lang="EN-GB">What I've listed above are the publicly
available ones, and I am aware that some others are "under construction".</span></p>
<p class="MsoNormal"><span lang="EN-GB"> </span></p>
<p class="MsoNormal"><span lang="EN-GB">The price for the corpora is from 27-70
RMB, approximately 3-7 GB pounds (or 6-14 USD) per corpus. </span></p>
<p class="MsoNormal"><span lang="EN-GB"> </span></p>
<p class="MsoNormal"><span lang="EN-GB">These corpora have been the empirical
foundation for hundreds of journal articles and theses in China.</span></p>
<p class="MsoNormal"><span lang="EN-GB"> </span></p>
<p class="MsoNormal"><span lang="EN-GB">Jiajin Xu</span></p>
<p class="MsoNormal"><span lang="EN-GB">Ph.D.</span></p>
<p class="MsoNormal"><span lang="EN-GB">National Research Centre for Foreign
Language Education</span></p>
<p class="MsoNormal"><span lang="EN-GB">Beijing Foreign Studies University</span></p>
<p class="MsoNormal"><span lang="EN-GB"><a href="mailto:xujiajin@bfsu.edu.cn">xujiajin@bfsu.edu.cn</a></span></p>