[Corpora-List] Looking for large individual L2 learner corpora

Piotr Pęzik pezik at uni.lodz.pl
Tue Jun 11 18:17:12 UTC 2013


Dear Morten,

The spoken component of the PELCRA Learner English Corpus corpus currently
contains about 130KWs of transcribed interviews with Polish speakers of
English. It's time-aligned and it can be downloaded with the original
(mostly high quality) recordings and additional metadata, which makes it
possible to extract samples for individual speakers. It comes with some
phonetic annotation as well (word mispronunciations).

To get a feeling of what the corpus contains, you can go to this page:

http://pelcra.pl/plec/

and type in a query such as 'test'. All the concordances with a little
speaker icon next to them come from the spoken subcorpus and they can be
played out in the browser.

The entire spoken subcorpus is downloadable for academic use:

http://metashare.ia.uni.lodz.pl/repository/browse/pelcra-spoken-learner-english-corpus/5efb34ba662611e2958a525400d76147ab57c052d50f4b5387545a959f6db0f8/

Best,

Piotr Pęzik



On Mon, Jun 10, 2013 at 10:45 PM, Morten Christiansen
<christiansen at cornell.edu> wrote:
Hi Corpora-List users,

I'm looking for large corpora with spoken language by individual L2
learners. I'm aware of resources such as the LINDSEI corpus, which
contains interviews with many L2 learners but where there's only about
1-2K words per individual learner. The largest corpus that I've found so
far for an individual learner, the Andrea corpus (part of the European
Science Foundation Second Language Databank), contained some 5.6K words
after processing.

I would welcome pointers to individual L2 learner corpora that contain
more than 6K words. The above corpora relate to L2 speakers of English,
but I'd be interested in other languages as well.

The corpora will be used to compare the building blocks of L1 and L2
learning using computational modeling.

Best,
        Morten Christiansen
------
Morten H. Christiansen, PhD
Professor, Department of Psychology, Cornell University, Ithaca, NY 14853
Co-Director, Cornell Cognitive Science Program
External Professor, Santa Fe Institute
Office: 228 Uris Hall   ||   Phone: +1 (607) 255-3834 (dept)   ||   Fax:
+1 (607) 255-8433
Email:  christiansen at cornell.edu
Web: http://www.psych.cornell.edu/people/Faculty/mhc27.html
Cornell Cognitive Neuroscience Lab: http://cnl.psych.cornell.edu




_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



-- 
Piotr Pęzik


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list