<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta name="Generator" content="Microsoft Word 12 (filtered medium)">
<!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><style><!--
/* Font Definitions */
@font-face
{font-family:Wingdings;
panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
{font-family:Wingdings;
panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
{font-family:Consolas;
panose-1:2 11 6 9 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p
{mso-style-priority:99;
mso-margin-top-alt:auto;
margin-right:0cm;
mso-margin-bottom-alt:auto;
margin-left:0cm;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
p.MsoAcetate, li.MsoAcetate, div.MsoAcetate
{mso-style-priority:99;
mso-style-link:"Balloon Text Char";
margin:0cm;
margin-bottom:.0001pt;
font-size:8.0pt;
font-family:"Tahoma","sans-serif";}
p.ecxmsonormal, li.ecxmsonormal, div.ecxmsonormal
{mso-style-name:ecxmsonormal;
mso-margin-top-alt:auto;
margin-right:0cm;
mso-margin-bottom-alt:auto;
margin-left:0cm;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
p.ecxmsoplaintext, li.ecxmsoplaintext, div.ecxmsoplaintext
{mso-style-name:ecxmsoplaintext;
mso-margin-top-alt:auto;
margin-right:0cm;
mso-margin-bottom-alt:auto;
margin-left:0cm;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
p.ecxmsoacetate, li.ecxmsoacetate, div.ecxmsoacetate
{mso-style-name:ecxmsoacetate;
mso-margin-top-alt:auto;
margin-right:0cm;
mso-margin-bottom-alt:auto;
margin-left:0cm;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
span.ecxmsohyperlink
{mso-style-name:ecxmsohyperlink;}
span.ecxmsohyperlinkfollowed
{mso-style-name:ecxmsohyperlinkfollowed;}
span.ecxemailstyle17
{mso-style-name:ecxemailstyle17;}
span.ecxplaintextchar
{mso-style-name:ecxplaintextchar;}
span.ecxballoontextchar
{mso-style-name:ecxballoontextchar;}
p.ecxmsonormal1, li.ecxmsonormal1, div.ecxmsonormal1
{mso-style-name:ecxmsonormal1;
mso-margin-top-alt:auto;
margin-right:0cm;
margin-bottom:0cm;
margin-left:0cm;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri","sans-serif";}
span.ecxmsohyperlink1
{mso-style-name:ecxmsohyperlink1;
color:blue;
text-decoration:underline;}
span.ecxmsohyperlinkfollowed1
{mso-style-name:ecxmsohyperlinkfollowed1;
color:purple;
text-decoration:underline;}
p.ecxmsoplaintext1, li.ecxmsoplaintext1, div.ecxmsoplaintext1
{mso-style-name:ecxmsoplaintext1;
mso-margin-top-alt:auto;
margin-right:0cm;
margin-bottom:0cm;
margin-left:0cm;
margin-bottom:.0001pt;
font-size:10.5pt;
font-family:Consolas;}
p.ecxmsoacetate1, li.ecxmsoacetate1, div.ecxmsoacetate1
{mso-style-name:ecxmsoacetate1;
mso-margin-top-alt:auto;
margin-right:0cm;
margin-bottom:0cm;
margin-left:0cm;
margin-bottom:.0001pt;
font-size:8.0pt;
font-family:"Tahoma","sans-serif";}
span.ecxemailstyle171
{mso-style-name:ecxemailstyle171;
font-family:"Calibri","sans-serif";
color:windowtext;}
span.ecxplaintextchar1
{mso-style-name:ecxplaintextchar1;
font-family:Consolas;}
span.ecxballoontextchar1
{mso-style-name:ecxballoontextchar1;
font-family:"Tahoma","sans-serif";}
span.EmailStyle35
{mso-style-type:personal-reply;
font-family:"Calibri","sans-serif";
color:#1F497D;}
span.BalloonTextChar
{mso-style-name:"Balloon Text Char";
mso-style-priority:99;
mso-style-link:"Balloon Text";
font-family:"Tahoma","sans-serif";}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-GB" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Dear Anabela<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">I’m glad you found what you wanted!
</span><span style="font-size:11.0pt;font-family:Wingdings;color:#1F497D">J</span><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">As often happens on corpora-list, initial postings often underspecify the datasets being sought…<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">so initial replies often request clarification.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Forgive me if I remain somewhat sceptical about many of the principles and techniques<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">used in detecting, categorising and correcting ‘errors’. From my experience of encountering<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">and thinking about language ‘errors’ (my own as well as other people’s):<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">#1 the ‘locus of manifestation’ of a marked usage is often not the ‘trigger point’?<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">#2 the ‘grammatical manifestation’ may have semantic, phraseological, pragmatic, or discourse-level origins?<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">#3 the ‘categorisation’ and ‘correction’ of the ‘error’ may therefore not be appropriate?<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">#4 for pedagogy, correction is a weak tool; ‘learning’ is discouraged, ‘autonomy’ is reduced?<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">#5 ‘style’ is an even more subtle area; automatic substitution of ‘weak’ verbs with ‘strong’<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">verbs would create ‘instruction-leaflet’ style rather than professional style? ‘Unnecessary words’ is
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">a highly subjective category - it would be interesting to take some exemplars of ‘good style’<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">(as evaluated by a reasonably large consensus), and see how many words in them are regarded<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">as ‘unnecessary’ by your system? A degree of redundancy is involved even in ‘professional’ style?
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Best wishes<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Ramesh<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Ramesh Krishnamurthy<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Visiting Academic Fellow, School of Languages and Social Sciences, Aston University, Birmingham B4 7ET<o:p></o:p></span></p>
</div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">PS if you are interested in alternative approaches, please see<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><a href="http://acorn.aston.ac.uk/acorn_publication.html">http://acorn.aston.ac.uk/acorn_publication.html</a> > section D >
<o:p></o:p></span></p>
<p class="MsoNormal">10. 08/02/11: <a href="http://acorn.aston.ac.uk/RK-publications/RK-EASG-Seminar080211-plus-demo.pdf">
Using Corpora for Autonomous Correction and Improvement of Academic Writing</a><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">as one starting point…
</span><span style="font-size:11.0pt;font-family:Wingdings;color:#1F497D">J</span><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<div>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span lang="EN-US" style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From:</span></b><span lang="EN-US" style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> Anabela Barreiro [mailto:barreiro_anabela@hotmail.com]
<br>
<b>Sent:</b> 16 April 2012 11:34<br>
<b>To:</b> Krishnamurthy, Ramesh<br>
<b>Cc:</b> corpora@uib.no<br>
<b>Subject:</b> RE: corpora of grammatical errors<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<p class="MsoNormal">Dear Corpora-List Members, <o:p></o:p></p>
<p style="margin-bottom:12.0pt">I would like to thank all who have sent me individual e-mails with suggestions, including indication on where to find corpora for languages other than English and the Romance languages.<br>
<br>
In reply to Ramesh,<br>
<br>
I would say that they all contain sentences with grammatical errors. I am interested in corpora where all sentences have errors on particular aspects of the grammar (prepositions, verb tenses, negation, coordination, etc., etc., etc.) with some pre-selection
and pre-categorization of the ungrammaticality of the sentences. In the past, system developers used what was called "test suites", mostly fabricated by linguists for the specific purpose of testing a particular system. I am interested in sentences that come
from "real" usage of language by non-native speakers, but also native speakers with writing difficulties or writing texts where language and style is not optimized or could be improved. When supporting editing of a text, existing grammar checkers are not sophisticated
enough to identify all the grammar problems and often identify as a problem perfectly correct sentences (false positives and false negatives). In addition to correction, there is also the potential for providing better solutions for writing (including more
categories to the typology)... For example, I can fix support verb constructions with "weak" verbs into semantically "strong" verbs, which gives the text a more professional style, eliminates words that are unecessary, helps texts being translated more efficiently
by humans and machines, etc.<br>
<br>
<span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From my request on this list, I found out that there is an ongoing shared task concerned with the automated correction of errors in text by Robert Dale and Adam Kilgarriff :
</span><br>
<span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""><a href="http://clt.mq.edu.au/research/projects/hoo/" target="_blank"><span style="color:#0068CF">http://clt.mq.edu.au/research/projects/hoo/</span></a></span><br>
<span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""><br>
This is a especially interesting task because it groups errors into linguistic categories. Hoo already includes preposition and determiner errors in exam scripts authored by learners of English as a Second Language, but their goal is to enlarge the typology
of linguistic errors. That's all I wished for :)<br>
<br>
Thank you all,<br>
<br>
Anabela</span><o:p></o:p></p>
<div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">-------------------------------------------------------------------------------------------------<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><strong><i><span style="font-size:7.5pt;font-family:"Tahoma","sans-serif";color:#00B050">Think GREEN - Act GREEN!</span></i></strong><i><span style="font-size:7.5pt;font-family:"Tahoma","sans-serif""><br>
<br>
</span></i><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif";color:black">Anabela M. Barreiro</span><span style="font-size:4.0pt;font-family:"Tahoma","sans-serif""><br>
</span><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif";color:black">Personal webpage:
</span><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""><a href="https://www.l2f.inesc-id.pt/wiki/index.php/Anabela_Barreiro" target="_blank"><span style="color:#0068CF">https://www.l2f.inesc-id.pt/wiki/index.php/Anabela_Barreiro</span></a><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif";color:black">LinkedIn:
</span><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""><a href="http://www.linkedin.com/pub/3/219/A43" target="_blank"><span style="color:#0068CF">http://www.linkedin.com/in/anabelabarreiro<br>
</span></a><o:p></o:p></span></p>
</div>
<div style="margin-bottom:14.0pt">
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">-------------------------------------------------------------------------------------------------<o:p></o:p></span></p>
</div>
</div>
<div>
<div class="MsoNormal" align="center" style="text-align:center">
<hr size="2" width="100%" align="center" id="stopSpelling">
</div>
<p class="MsoNormal" style="margin-bottom:12.0pt">From: <a href="mailto:r.krishnamurthy@aston.ac.uk">
r.krishnamurthy@aston.ac.uk</a><br>
To: <a href="mailto:barreiro_anabela@hotmail.com">barreiro_anabela@hotmail.com</a><br>
CC: <a href="mailto:corpora@uib.no">corpora@uib.no</a><br>
Subject: corpora of grammatical errors<br>
Date: Sun, 15 Apr 2012 12:42:20 +0000<o:p></o:p></p>
<div>
<p class="ecxmsonormal">Hi Anabela<o:p></o:p></p>
<p class="ecxmsonormal"> <o:p></o:p></p>
<p class="ecxmsonormal">#1 Do ALL the currently available public corpora not ‘contain sentences with grammatical errors’?<o:p></o:p></p>
<p class="ecxmsonormal">Very few (if any) texts will be 100% grammatically ‘correct’ (whichever model of grammar you use)?<o:p></o:p></p>
<p class="ecxmsonormal">So BNC, COCA, etc should be OK for you?<o:p></o:p></p>
<p class="ecxmsonormal">But the specific ‘errors’ your system identifies will of course depend on your choice of model.<o:p></o:p></p>
<p class="ecxmsonormal"> <o:p></o:p></p>
<p class="ecxmsonormal">#2 If you want a corpus with a high proportion of ‘errors’, would any available LANGUAGE LEARNER,
<o:p></o:p></p>
<p class="ecxmsonormal">NON-NATIVE-SPEAKER, NON-STANDARD, or VARIETAL corpus be sufficient for your purposes? These<o:p></o:p></p>
<p class="ecxmsonormal">corpora should be easy to find via Google, by specifying one of those attributes?<o:p></o:p></p>
<p class="ecxmsonormal"> <o:p></o:p></p>
<p class="ecxmsonormal">Hope this helps<o:p></o:p></p>
<p class="ecxmsonormal">Ramesh<o:p></o:p></p>
<p class="ecxmsonormal"> <o:p></o:p></p>
<p class="ecxmsonormal">Ramesh Krishnamurthy<o:p></o:p></p>
<p class="ecxmsonormal">Visiting Academic Fellow, School of Languages and Social Sciences, Aston University, Birmingham B4 7ET<o:p></o:p></p>
<p class="ecxmsonormal"><br>
Director, ACORN (Aston Corpus Network project): <a href="http://acorn.aston.ac.uk/" target="_blank">
http://acorn.aston.ac.uk/</a> <o:p></o:p></p>
<p class="ecxmsonormal">Corpus Analyst:<o:p></o:p></p>
<p class="ecxmsonormal">(a) GeWiss (Volkswagen Foundation) project: <a href="http://www1.aston.ac.uk/lss/research/research-projects/gewiss-spoken-academic-discourse/" target="_blank">
http://www1.aston.ac.uk/lss/research/research-projects/gewiss-spoken-academic-discourse/</a><o:p></o:p></p>
<p class="ecxmsonormal">(b) Discourse of Climate Change: <a href="http://www1.aston.ac.uk/lss/research/research-projects/discourse-of-climate-change-project/" target="_blank">
http://www1.aston.ac.uk/lss/research/research-projects/discourse-of-climate-change-project/</a><o:p></o:p></p>
<p class="ecxmsonormal">(c) Feminism: <a href="http://acorn.aston.ac.uk/projects.html" target="_blank">
http://acorn.aston.ac.uk/projects.html</a><o:p></o:p></p>
<p class="ecxmsonormal">(d) COMENEGO (Corpus Multilingüe de Economía y Negocios) - Multilingual Corpus of Business and Economics:
<a href="http://dti.ua.es/comenego" target="_blank">http://dti.ua.es/comenego</a><o:p></o:p></p>
<p class="ecxmsonormal">(e) European Phraseology Project: <a href="http://labidiomas3.ua.es/phraseology/login/login.php" target="_blank">
http://labidiomas3.ua.es/phraseology/login/login.php</a><o:p></o:p></p>
<p class="ecxmsonormal">-------------------------------------------------------------------------------------------------------------------------<o:p></o:p></p>
<p class="ecxmsonormal"> <o:p></o:p></p>
<p class="ecxmsoplaintext">Date: Sat, 14 Apr 2012 10:24:50 +0000<o:p></o:p></p>
<p class="ecxmsoplaintext">From: Anabela Barreiro <<a href="mailto:barreiro_anabela@hotmail.com">barreiro_anabela@hotmail.com</a>><o:p></o:p></p>
<p class="ecxmsoplaintext">Subject: [Corpora-List] corpora of grammatical errors<o:p></o:p></p>
<p class="ecxmsoplaintext">To: "<a href="mailto:corpora@uib.no">corpora@uib.no</a>" <<a href="mailto:corpora@uib.no">corpora@uib.no</a>><o:p></o:p></p>
<p class="ecxmsoplaintext"> <o:p></o:p></p>
<p class="ecxmsoplaintext"> <o:p></o:p></p>
<p class="ecxmsoplaintext">Dear Corpora List Members,<o:p></o:p></p>
<p class="ecxmsoplaintext">I am looking for public corpora containing sentences with grammatical errors.<o:p></o:p></p>
<p class="ecxmsoplaintext">I plan to use the corpora as input to grammar checking and correction routines.<o:p></o:p></p>
<p class="ecxmsoplaintext">The corpora can be in English or romance languages. I appreciate any indication of where I can find those corpora. Thank you!<o:p></o:p></p>
<p class="ecxmsoplaintext"> <o:p></o:p></p>
</div>
</div>
</div>
</div>
</div>
</body>
</html>