<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Hi Nicola,<br><br>Roger Mitton has done a lot of work in the area of computer spell checking. <a href="http://www.dcs.bbk.ac.uk/~roger/">http://www.dcs.bbk.ac.uk/~roger/</a> His university also makes available a corpus of misspelled words that may be useful to you. One of his students also did a thesis on real-word errors. This is English only though.<br><br>One of my favorite essays on spelling correction is: <a href="http://norvig.com/spell-correct.html">http://norvig.com/spell-correct.html</a><br><br>For spelling correction, I've created an open source system called After the Deadline. It includes a call that will generate statistics about the writing quality of a text you give to it [ see:<a href="http://www.afterthedeadline.com/api.slp">http://www.afterthedeadline.com/api.slp</a> ]. It also does some real-word error detection but this is based on trigrams and fixed confusion sets. You can look at it at<a href="http://open.afterthedeadline.com/">http://open.afterthedeadline.com</a> Again, this system is English only at this time.<br><br>I also make available a package of English boostrap data that includes text from public domain books and Wikipedia infused with spelling and grammar errors taken from Wikipedia's list of commonly misspelled words.<br><br>For noisy texts, I recommend googling for a "Learner Corpus".<br><br>Best of luck.<br><br>-- Raphael<div><br><div><div>On Dec 3, 2009, at 9:41 AM, Nicola Bertoldi wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div>I am going to do some investigation to improve machine translation<br>when it is applied to texts corrupted by misspellings of any sort (non-word, real-word errors).<br><br>In this preliminary phase I am collecting information about the spelling correction task<br>and other applications and tasks which involves spelling correction.<br><br>In particular, I am interested in<br>- surveys about the task<br>- statistics about the most common misspellings in texts of different languages and different genres<br>- public available software for spelling correction<br>- available corpora of noisy texts<br>- any further resources which is possibly useful for my topic<br><br><br><br>Thanks!<br><br>Nicola<br><br>_______________________________________________<br>Corpora mailing list<br><a href="mailto:Corpora@uib.no">Corpora@uib.no</a><br>http://mailman.uib.no/listinfo/corpora<br></div></blockquote></div><br></div></body></html>