[Corpora-List] Spellchecker evaluation corpus

Eric Atwell csc6ea at leeds.ac.uk
Sat Apr 9 11:03:18 UTC 2011


Hi Stefan,

Jennifer Pedler's PhD developed a spelling-error detection tool,
evaluated on a corpus of real spelling errors; see

Jennifer Pedler, 2007.
Computer Correction of Real-word Spelling Errors in Dyslexic Text.
PhD thesis, Birkbeck, University of London.
http://www.dcs.bbk.ac.uk/research/recentphds/pedler.pdf

and her more recent work has extended this, see:

Jennifer Pedler and Roger Mitton. 2010.  A Large List of Confusion Sets 
for Spellchecking Assessed Against a Corpus of Real-word Errors.
Proc LREC'10. http://www.lrec-conf.org/proceedings/lrec2010/summaries/122.html

"... We describe the creation of a realistically sized list of confusion
sets, then the assembling of a corpus of real-word errors ..."

I propose this could be an agreed "Gold Standard" evaluation test-set
for spelling errors.

Eric


Eric Atwell, Senior Lecturer, Language research group,
  I-AIBS Institute for Artificial Intelligence and Biological Systems
  School of Computing, Faculty of Engineering, UNIVERSITY OF LEEDS
  Leeds LS2 9JT, England.        TEL: 0113-3435430  FAX: 0113-3435468
  WWW: http://www.comp.leeds.ac.uk/arabic
       http://www.comp.leeds.ac.uk/nlp




On Sat, 9 Apr 2011, Stefan Bordag wrote:

> Hi everyone,
>
> It seems like for every conceivable NLP task there is some agreed-upon
> evaluation data set. Or at least one that is used in at least several
> papers. Now, for some strange reason I seem to be utterly unable to find
> any such test set for the spell checking task!
>
> Am I doing something wrong or is there no such data set? I know I can
> make synthetic tests systematically inserting, swapping etc. letters in
> my own test data, but this would give me results which I cannot compare
> to any other results. Hence, is there some accepted evaluation forum
> which I am missing because whenever I include spell check in any form in
> search queries I get lots of tutorials how to write a spellchecker and
> almost nothing else...
>
> Best regards,
> Stefan Bordag
>

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list