[Corpora-List] Request for advice on creating a learners' corpus

William Gregory Sakas sakas at hunter.cuny.edu
Wed Jan 26 18:34:49 UTC 2005


Hi Victoria,

You might also want to get in touch with Martin Chodorow
who has done some work with English corpora of essays
written by Japanese English-language learners.

martin.chodorow at hunter.cuny.edu

Best,
-- Wm

William Gregory Sakas, Ph.D.
Computer Science and Linguistics
Hunter College and the Graduate Center
City University of New York
 
Voice:  (212) 772.5211
Fax:    (212) 772.5219
Email:  sakas at hunter.cuny.edu
Web:    http://www.hunter.cuny.edu/cs/Faculty/Sakas/
 
 

-----Original Message-----
From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no] On
Behalf Of Eric Atwell
Sent: Wednesday, January 26, 2005 11:50 AM
To: Victoria Muehleisen
Cc: CORPORA at UIB.NO; Latifa Al-Sulaiti
Subject: Re: [Corpora-List] Request for advice on creating a learners'
corpus

Victoria,

Latifa Al-Sulaiti was in a similar position about a year and a half ago, 
she planned to collect a million-word Corpus of Contemporary Arabic
- native-speaker texts rather than learner texts, but even so she faced
similar technical issues, as her background was in linguistics and
language teaching rather than computing, and she didnt start with prior
knowledge about seeking permissions, corpus structure and management, 
XML file format, markup info to add to file headers, etc. 
Her initial version of the corpus is now complete and online;
see http://www.comp.leeds.ac.uk/latifa

Her methods and solutions to the problems along the way are documented
in her MSc Thesis, also online:

Latifa Al-sulaiti <a
href="http://www.comp.leeds.ac.uk/cgi-bin/sis/ext/rs_pub.cgi?cmd=displayabst
ract&sid=200081109">(Abstract)</a>
(MSc) <br /> <a href="/research/pubs/theses/Latifa_MSc.pdf">Designing
and Developing a Corpus of Contemporary Arabic</a>

We are also writing a paper for IJCL; we could let you have a draft if
you're interested...

I'm sure Latifa would be happy to discuss issues further - do get in
touch direct.

Good luck with your project!

Eric Atwell, School of Computing, Leeds University

On Thu, 27 Jan 2005, Victoria Muehleisen wrote:

> Hello Everyone,
>
> I teach English at a university in Japan, and we recently received some
> grant money to set up a learners' corpus, of students' essays written
> in English.
>
> Although we have some ideas of how we can begin doing research once we
> have the corpus, we don't know anything about actually setting it up.
> What are the best formats for storing the essays?  For marking up the
> data?  What kind of information will be most useful to add to the
> files? (For example, we know that we'll want to identify the level of
> the class the essay was written for--there are basic, intermediate, and
> advanced level writing courses--and we'll also want to code for the
> native language of the writer--not all the studehts are Japanese--but
> are there other kinds of variables we should keep track of?)
>
> We would appreciate references to books/articles/web sites on setting
> up a learners' corpus, especially ones that don't assume too much
> technical computer knowledge.  We'll have people available to help up
> with the technical side, but we need to tell them what we want to do.
>
> In additional to references, if there is anyone who has created a
> learners' corpus and could warn us about any mistakes to avoid, that
> would also be very helpful.  And at the next stage, we'll need to start
> thinking about issues of student privacy/permission, so any references
> on those issues (in particular, ways that other corpus-creators have
> done it) would be very useful.
>
> Thanking you in advance,
>
> *********************************
> Victoria Muehleisen
>
> School of International Liberal Studies Waseda University
> Nishi-Waseda 1-6-1
> Shinjuku-ku, Tokyo 169-8050
>
> E-mail: <vicky at waseda.jp>
> Home page: <www.f.waseda.jp/vicky>
>
>
>

-- 
Eric Atwell, Senior Lecturer, Computer Vision and Language research group,
School of Computing, University of Leeds, LEEDS LS2 9JT, England
TEL: +44-113-2335430  FAX: +44-113-2335468  http://www.comp.leeds.ac.uk/eric



More information about the Corpora mailing list