[Corpora-List] corpora of grammatical errors

Diana Inkpen diana at site.uottawa.ca
Mon Apr 16 15:20:59 UTC 2012


Hi Anabela,

 

There is also a  Japanese Learner English Corpus, NICT JLE corpus (Izumia,
E., Uchimotoa, K., Isaharaa, H.: SST speech corpus of Japanese learners'
English and automatic detection of learners' errors. ICAME Journal 28 (2004)
31-48 )

 

My former PhD student used it for testing in his paper (he obtained it by
emailing the authors, I guess), in addition to an artificially generated
test set:

Aminul Islam and Diana Inkpen, "Correcting Different Types of Errors in
Texts", in Proceedings of the 24th Canadian Conference on Artificial
Intelligence (AI 2011), St-John's, NFL, Canada, May 2011, pp. 192-203, pdf
file <http://www.site.uottawa.ca/~diana/publications/aminul_CAI2011-1.pdf> .


 

  Diana

 

From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of
Anabela Barreiro
Sent: April-16-12 6:34 AM
To: Krishnamurthy, Ramesh
Cc: corpora at uib.no
Subject: Re: [Corpora-List] corpora of grammatical errors

 

Dear Corpora-List Members, 

I would like to thank all who have sent me individual e-mails with
suggestions, including indication on where to find corpora for languages
other than English and the Romance languages.

In reply to Ramesh,

I would say that they all contain sentences with grammatical errors. I am
interested in corpora where all sentences have errors on particular aspects
of the grammar (prepositions, verb tenses, negation, coordination, etc.,
etc., etc.) with some pre-selection and pre-categorization of the
ungrammaticality of the sentences. In the past, system developers used what
was called "test suites", mostly fabricated by linguists for the specific
purpose of testing a particular system. I am interested in sentences that
come from "real" usage of language by non-native speakers, but also native
speakers with writing difficulties or writing texts where language and style
is not optimized or could be improved. When supporting editing of a text,
existing grammar checkers are not sophisticated enough to identify all the
grammar problems and often identify as a problem perfectly correct sentences
(false positives and false negatives). In addition to correction, there is
also the potential for providing better solutions for writing (including
more categories to the typology)... For example, I can fix support verb
constructions with "weak" verbs into semantically "strong" verbs, which
gives the text a more professional style, eliminates words that are
unecessary, helps texts being translated more efficiently by humans and
machines, etc.
 
>>From my request on this list, I found out that there is an ongoing shared
task concerned with the automated correction of errors in text by Robert
Dale and Adam Kilgarriff : 
 <http://clt.mq.edu.au/research/projects/hoo/>
http://clt.mq.edu.au/research/projects/hoo/

This is a especially interesting task because it groups errors into
linguistic categories. Hoo already includes preposition and determiner
errors in exam scripts authored by learners of English as a Second Language,
but their goal is to enlarge the typology of linguistic errors. That's all I
wished for :)
 
Thank you all,
 
Anabela

----------------------------------------------------------------------------
---------------------

Think GREEN - Act GREEN!

Anabela M. Barreiro
Personal webpage:
<https://www.l2f.inesc-id.pt/wiki/index.php/Anabela_Barreiro>
https://www.l2f.inesc-id.pt/wiki/index.php/Anabela_Barreiro

LinkedIn:  <http://www.linkedin.com/pub/3/219/A43>
http://www.linkedin.com/in/anabelabarreiro


----------------------------------------------------------------------------
---------------------

  _____  

From: r.krishnamurthy at aston.ac.uk
To: barreiro_anabela at hotmail.com
CC: corpora at uib.no
Subject: corpora of grammatical errors
Date: Sun, 15 Apr 2012 12:42:20 +0000

Hi Anabela

 

#1 Do ALL the currently available public corpora not ‘contain sentences with
grammatical errors’?

Very few (if any) texts will be 100% grammatically ‘correct’ (whichever
model of grammar you use)?

So BNC, COCA, etc should be OK for you?

But the specific ‘errors’ your system identifies will of course depend on
your choice of model.

 

#2 If you want a corpus with a high proportion of ‘errors’, would any
available LANGUAGE LEARNER, 

NON-NATIVE-SPEAKER, NON-STANDARD, or VARIETAL corpus be sufficient for your
purposes? These

corpora should be easy to find via Google, by specifying one of those
attributes?

 

Hope this helps

Ramesh

 

Ramesh Krishnamurthy

Visiting Academic Fellow, School of Languages and Social Sciences, Aston
University, Birmingham B4 7ET


Director, ACORN (Aston Corpus Network project): http://acorn.aston.ac.uk/ 

Corpus Analyst:

(a) GeWiss (Volkswagen Foundation) project:
http://www1.aston.ac.uk/lss/research/research-projects/gewiss-spoken-academi
c-discourse/

(b) Discourse of Climate Change:
http://www1.aston.ac.uk/lss/research/research-projects/discourse-of-climate-
change-project/

(c) Feminism: http://acorn.aston.ac.uk/projects.html

(d) COMENEGO (Corpus Multilingüe de Economía y Negocios) - Multilingual
Corpus of Business and Economics: http://dti.ua.es/comenego

(e) European Phraseology Project:
http://labidiomas3.ua.es/phraseology/login/login.php

----------------------------------------------------------------------------
---------------------------------------------

 

Date: Sat, 14 Apr 2012 10:24:50 +0000

From: Anabela Barreiro <barreiro_anabela at hotmail.com>

Subject: [Corpora-List] corpora of grammatical errors

To: "corpora at uib.no" <corpora at uib.no>

 

 

Dear Corpora List Members,

I am looking for public corpora containing sentences with grammatical
errors.

I plan to use the corpora as input to grammar checking and correction
routines.

The corpora can be in English or romance languages. I appreciate any
indication of where I can find those corpora. Thank you!

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20120416/7ba63770/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list