[Corpora-List] Spellchecker evaluation corpus
A.P.J. van den Bosch
Antal.vdnBosch at uvt.nl
Thu Apr 14 08:40:51 UTC 2011
Dear Stefan,
All good points, but when you say
> - several collections of misspelled words along with a defined context size of differing languages to evaluate spelling error detectors and correctors
what do you mean with a defined context size? What seems to be missing from your list is what I think should be the ultimate evaluation setting: _full_ texts with _all_ errors annotated.
Error list evaluations cannot measure the false alarm rate or precision of your spelling error detector: how often does it think it has found an error which isn't one? Put in another way, an algorithm with a great recall/accuracy on an error list may actually be an over-enthousiastic system that flags many normal words as errors as well.
For fully-automatic correction and corpus cleanup this is quite vital - does your method do more harm than good? But also interactive spellcheckers could do with a higher precision; as one of the most widely used pieces of language technology worldwide, it's not particularly loved for its low precision.
Antal
--
Antal van den Bosch Antal.vdnBosch at uvt.nl http://ilk.uvt.nl/~antalb/
ILK / Tilburg center for Cognition and Communication, Tilburg University
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list