[Corpora-List] Spellchecker evaluation corpus
Eric Atwell
csc6ea at leeds.ac.uk
Sat Apr 9 11:03:18 UTC 2011
Hi Stefan,
Jennifer Pedler's PhD developed a spelling-error detection tool,
evaluated on a corpus of real spelling errors; see
Jennifer Pedler, 2007.
Computer Correction of Real-word Spelling Errors in Dyslexic Text.
PhD thesis, Birkbeck, University of London.
http://www.dcs.bbk.ac.uk/research/recentphds/pedler.pdf
and her more recent work has extended this, see:
Jennifer Pedler and Roger Mitton. 2010. A Large List of Confusion Sets
for Spellchecking Assessed Against a Corpus of Real-word Errors.
Proc LREC'10. http://www.lrec-conf.org/proceedings/lrec2010/summaries/122.html
"... We describe the creation of a realistically sized list of confusion
sets, then the assembling of a corpus of real-word errors ..."
I propose this could be an agreed "Gold Standard" evaluation test-set
for spelling errors.
Eric
Eric Atwell, Senior Lecturer, Language research group,
I-AIBS Institute for Artificial Intelligence and Biological Systems
School of Computing, Faculty of Engineering, UNIVERSITY OF LEEDS
Leeds LS2 9JT, England. TEL: 0113-3435430 FAX: 0113-3435468
WWW: http://www.comp.leeds.ac.uk/arabic
http://www.comp.leeds.ac.uk/nlp
On Sat, 9 Apr 2011, Stefan Bordag wrote:
> Hi everyone,
>
> It seems like for every conceivable NLP task there is some agreed-upon
> evaluation data set. Or at least one that is used in at least several
> papers. Now, for some strange reason I seem to be utterly unable to find
> any such test set for the spell checking task!
>
> Am I doing something wrong or is there no such data set? I know I can
> make synthetic tests systematically inserting, swapping etc. letters in
> my own test data, but this would give me results which I cannot compare
> to any other results. Hence, is there some accepted evaluation forum
> which I am missing because whenever I include spell check in any form in
> search queries I get lots of tutorials how to write a spellchecker and
> almost nothing else...
>
> Best regards,
> Stefan Bordag
>
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list