[Corpora-List] Machine Translation and Spelling Correction

Linas Vepstas linasvepstas at gmail.com
Thu Dec 3 19:04:11 UTC 2009


2009/12/3 Marcin Miłkowski <list-address at wp.pl>:
>
> For spell-checking, you can use ispell (a bit outdated), aspell (modern), or
> hunspell (good for complex compounding languages).

Naive use of spelling-checkers can quickly lead to garbage,
and/or a combinatorial explosion.  To paraphrase J sinclair
-- it is wrong to consider spelling without also considering
the lexical and syntactic context in which the spelling error
is made.

Speaking from experience, I've found that running text
through a spell-checker before doing any other processing
mostly just damages the text.   The best strategy seems to
be to leave the mis-spelled word in place -- and add it to
your NLP or machine-translation lexicon, which will
"understand" enough of the syntactic/lexical environment to
"do the right thing" with the mis-spelled word.

--linas

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list