[Corpora-List] Spellchecker evaluation corpus

Trevor Jenkins trevor.jenkins at suneidesis.com
Thu Apr 14 11:34:37 UTC 2011


On Thu, 14 Apr 2011, Stefan Bordag <sbordag at informatik.uni-leipzig.de> wrote:

> I imagine, however, that it wouldn't be conceptually difficult to set up
> a test that covers most or all of these needs you mentioned. A proper
> evaluation setup for spellchecking in general would consist of:
> - ... misspelled words along with a defined context
> - ... source of error ...
> - ... string pairs (wrong to correct) ...
> - ... spell checkers that need training data ...
> - ... resource usage ...
> - ... different languages ..

You have omitted, at least, one other issue. Namely the longitudinal
changes to spelling *conventions*. Easily demonstrated in English by
considering the word spellings in Shakespeare, the King James Version of
the Bible, the novels of Jane Austen. Their works contains words whose
then accepted spellings are not use today. And undoubtedly one can find
for other languages similar historical literature with differing
spellings. There is the forward version in that contemporary spellings
conventions may not be considered correct at some future date.

Regards, Trevor

<>< Re: deemed!


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list