[Corpora-List] Summary: responses to request about creating a learners' corpus
Victoria Muehleisen
vicky at waseda.jp
Fri Jan 28 12:37:16 UTC 2005
Hello Everyone,
I received many, many useful replies to my request for help on making a learners' corpus. A few people sent very specific and helpful answers to my questions about tagging and text processing. I'm not sure if they are of interest to everyone, so I am only summarizing the information more generally useful to people who are thinking about making a learners' corpus.
First, of all, many people pointed me to Sylvaine Granger's books and articles, the International Corpus of Learner's English (ICLE) and related website at the Centre for English Corpus Linguistics (CECL)
The website is a big one, and these direct URLs may be most useful.
- The Learner Corpus Bibliography (very extensive!) <http://www.fltr.ucl.ac.be/fltr/germ/etan/cecl/learner%20corpus%20bibliography.html>
- Description of the ICLE project, including useful for guidelines for collecting a sub-corpus and learner profile information: <http://www.fltr.ucl.ac.be/fltr/germ/etan/cecl/Cecl-Projects/Icle/icle.htm>
- The first two chapters of Prof. Granger's book "Learner English on Computer" focus on design of the corpus and tools for analysis, very helpful. Granger S. (ed.) (1998b) Learner English on Computer. London & New York: Addison Wesley Longman (228 pp.)
Next, I heard from people involved in the creation of a few other learners' corpora.
Ylva Berglund told me about the Uppsala Student English Corpus (USE).
- The USE web site has a fairly detailed description of how they created the corpus: <http://www.engelska.uu.se/use.html>
For me right now, it was most useful to read about the file system and encoding and collection procedures. The sample consent form and questionnaire for background data will also and article be useful.
Prof. Berglund also included links to articles about USE and/or using data from it:
-ICAME Journal article: http://nora.hd.uib.no/icame/ij24/use.pdf
-Article on statistical genre analysis of the corpus: <http://www.english.bham.ac.uk/staff/omason/publications/cl2001/berglund-mason.html>
-Articles analysing USE corpus data:
<http://www.tu-chemnitz.de/phil/InternetGrammar/publications/hahn.pdf>
<http://www.svenska.gu.se/~svelb/kurs/komvel2/BorinPrytz.pdf>
<http://www.svenska.gu.se/%7Esvelb/kurs/komvel2/BorinPrytz.pdf>.
- Prof. Berglund also recommended this summary of a large number of learner corpora written by Norma Pravec: <http://nora.hd.uib.no/icame/ij26/pravec.pdf>.
- Eric Atwell told me about the MSc Thesis of Latifa Al-sulaiti, "Designing and Developing a Corpus of Contemporary Arabic", available on-line at <http://www.comp.leeds.ac.uk/research/pubs/theses/Latifa_MSc.pdf>. Her web page is here <http://www.comp.leeds.ac.uk/latifa>, and has information about corpus design and Arabic learner's corpora.
- Timothy Baldwin told me about a corpus of spoken learner English which has been published in Japan (sorry, Japanese web page only):
<http://home.alc.co.jp/db/owa/sp_item_detail?p_sec_cd=31&p_item_cd=7004108>
-Finally, someone sent me the URL of the Montclair Electronic Language Database, a learners' corpus which is on-line and available for us. It will be very useful for me to show to staff at my school who don't really "get" what a corpus is like: <http://www.chss.montclair.edu/linguistics/MELD/>
{I seem to have deleted the e-mail about this, so I can't give credit to the person who told me about this--sorry!}
I'd like to thank everyone for the responses. I feel like I now have a lot of the information I need to get the project off the ground. Now, I guess it's time to start reading!
*********************************
Victoria Muehleisen
School of International Liberal Studies Waseda University
Nishi-Waseda 1-6-1
Shinjuku-ku, Tokyo 169-8050
E-mail: <vicky at waseda.jp>
Home page: <www.f.waseda.jp/vicky>
More information about the Corpora
mailing list