[Corpora-List] RE: automatic error correction

Sun Dec 28 21:51:04 UTC 2003

Jason,

I agree my explanation is very broad.  Basically, I am developing
techniques for grammatical error correction using insertion, deletion and
substitution. What you would call the robust parsing approach, although I
plan to combine both analytical techniques and statistical ones.  For
example, the grammar and an equivalent PDA built from it gives one the
ability to answer questions like "When a parser in some parse state is
faced with error symbol X, what symbols can I add or delete from the
input, or substitute for X (and maybe some trailing symbols), which will
allow the parser to continue?"  It is also possible to analytically
determine what corrections guarantee that the parser will continue to
completion.  In a broad statistical extension of this robust parsing
approach, one could choose the corrections which are most likely to lead
to the best parse.  This technique need not be entirely local, as it is
possible to determine the language of sequences of error corrections to
apply at different places in any input generated by the PDA.  Moreover, if
one complements left-to-right parsing with right-to-left parsing it is
possible to locate and correct errors which occur before the parser
encounters a symbol which it cannot process.

That is what approach I choose to use.  I didn't restrict my question to
just ASR or OCR or discourse analysis because I believe all of these
applications can be approached from a grammatical perpsective. And I
didn't restrict my question to a particular way of approaching any of
these applications because I want to look at all of the solutions which
others have developed for these problems.

Thank you for your message which gives me some more information as to how
one could approach error correction in each of these domains.  After
reading your message I took a look at Jurafsky and Martin's book and
Jelinek's book. I didn't realize J & M had any material on error
correction for ASR and for OCR.

Thanks again,
John

> John,
>
> Your explanation is still very broad.  You ought to tell me concretely
what situation you find yourself in -- what are you actually trying to
*do*?  I doubt you're trying to build a universal error correction
device that is capable of dealing with ASR *and* OCR *and* discourse
...!
>
> A noisy-channel approach says: I've observed erroneous signal Y.  What
was the true signal X?  Let's consider every possible X, and pick the
one that maximizes
>    p(X) * p(Y | X)
> where p(X) is the a priori probability that the truth was X, and p(Y |
X) is the probability that you would have observed Y if the truth was X.
 The art here is in defining these two probability distributions. p(Y |
X), for example, is a model of how errors are typically
> introduced.  It would be defined as a function of Y, X, and some
parameters.  The form of the function (stipulated by an expert)
> describes what *kinds* of errors are introduced, and the parameters
(typically learned from data) quantify how often the different errors
are introduced, so that one can compute a numerical probability p(Y |
X).
>
> You could regard an ASR system as a noisy channel.  X is the true text,
and Y is the text produced by the ASR system.  So you would need to
model the kinds of errors that are introduced by the system.
>
> However, you should realize that ASR and OCR systems are themselves
built as noisy channel models.  For example, in ASR, X is the text, and
Y is the speech.  Given Y, the ASR system wants to find the most
probable X.  You would have to work pretty hard to build a noisy channel
model that can correct errors beyond what the ASR system already does --
unless the ASR system is making assumptions that are not warranted in
your situation (e.g., it thinks you're speaking English in a quiet room,
whereas in fact you're reading computer code in a windy echo chamber).
>
> You can find out more about noisy channel models in any good
> natural language processing or speech processing textbook.
> A couple of textbooks are recommended at my course homepage:
>    http://cs.jhu.edu/~jason/465
>
> For grammatical errors, you may want to build what is called a robust
parser, which is capable of parsing sentences that are "not quite
grammatical," correcting them in the process.  But this is really an
instance of the noisy channel model.  First you build an ordinary
context-free parser, which defines p(X), and then you compose it with a
error model p(Y | X), which is typically a finite-state transducer.
(Finite-state devices are in general very useful in building noisy
channel models.)
>
> There are other ways to correct localized errors.  You can build a
machine learning classifier that knows how to correct a particular kind
of error, e.g.., it knows when to change "there" to "they're." The
classifier considers a bunch of contextual features, such as surrounding
words.  There are many techniques for this, since machine learning is a
large field.  Google for "context-sensitive spelling correction" to get
a few of the basic ones (early projects by
> Yarowsky, Golding, and Roth).  You can apply the individual
> classifiers in parallel or in sequence, perhaps using a technique like
Transformation-Based Learning to choose the best order.
>
> Hope this helps lay out some of the territory, but the best technique to
use will vary from situation to situation.  You have to look at the data
to see (1) what kinds of errors show up in real examples and (2) what
else in the example tells you that they're errors and how to correct
them.
>
> -jason
>
>> The kinds of errors I am talking about are ones in which the input does
not match what a predictive processor expects. In ASR, the system might
not recognize the next phoneme or word.  An OCR system might produce
output which isn't correct from the view of what the legal spelling of
the
>> next word to be processed is or even what the next expected word should
be. In textual processing, the kinds of errors would be mainly ones in
which the next word of the input is spelled incorrectly or doesn't
match any of the words the text processor expects to see.
Alternatively, a discourse processing system might expect a missing cue
or a type of discourse component.
>>
>> In a way, the errors can all be cast as grammatical ones,  but I don't
choose to do so because I am interested in all of the different methods
which have been used with predictive processors to automatically
correct errors.
>>
>> -John
>>
>> > Is there some particular kind of error you are interested in
>> > correcting?  Might help me decide what reading to recommend.
>> >
>> > -j
>> >
>> >
>> >> Jason,
>> >>
>> >> I am not familiar with the noisy channel model.  If you could point
>> me
>> >> in
>> >> the direction of references describing methods which have been used
>> for
>> >> automatic error correction, especially of the types you listed, it
>> would
>> >> help me very much.
>> >>
>> >> Best,
>> >> John
>> >>
>> >>
>> >> >> What systems and articles are you aware of with regard to
>> automatic
>> >> >> error
>> >> >> correction for use with speech and textual processing and OCR
interfaces?
>> >> >
>> >> > Standard approach is to use a noisy channel model.  I'd implement
>> it
>> >> > using a toolkit for weighted finite-state transducers, such as
>> AT&T's
>> >> > FSM toolkit.
>> >> >
>> >> > For context-sensitive spelling correction, another alternative is
>> to
>> >> > use a word-sense disambiguation system.
>> >> >
>> >> > If these terms are unfamiliar and aren't explained by other
respondents, write back and I can try to point you in the right
direction.
>> >> >
>> >> > Good luck,
>> >> > Jason Eisner
>> >> >
>> >>
>> >
>>
>>
>>
>