Corpora: approximations (bounds) for edit distance

Gregory Aist aist+ at andrew.cmu.edu
Fri Nov 30 18:59:15 UTC 2001


Excerpts from mail: 30-Nov-101 Re: Corpora: approximations.. by Bruce L.
Lambert at uic.edu
> Date: Fri, 30 Nov 2001 12:42:54 -0600
> To: CORPORA at HD.UIB.NO
> From: "Bruce L. Lambert, Ph.D." <lambertb at uic.edu>
> Subject: Re: Corpora: approximations (bounds) for edit distance
>
> Maybe I'm missing something, but the upper bound on edit distance between
> two strings is always the length of the longer string, and the lower bound
> is always zero (when the strings are identical).
>
> -bruce
>

I suspect CR is looking for
least upper bounds and greatest lower bounds...

>
> At 06:43 PM 11/29/01 +0000, Computer Researcher wrote:
> >Hi,
> >
> >Does anyone know good approximations (lower and/or upper bounds) to edit
> >distance? (by using some statistical numbers that can be found by
> >preprocessing of the strings)
> >
> >In the preprocess time we can transform the strings to a bunch of numbers
> >(e.g., multi-dimensional vectors); and then use these vectors to
> >approximate the edit distance between strings.
> >
> >I found a paper by Hadlock, F. (1988), proposing a "lower bound" by using
> >frequencies of the letters in the string. Assuming that the alphabet is
> >same for all strings, all frequency vectors will have same number of
> >dimensions. And he defines a distance metric over these vectors, so that
> >this distance (in the vector space) is a lower-bound to the actual edit
> >distance.
> >
> >Do you know any other method that can achieve a similar goal?
> >
> >Thanks for your attention,
> >
> >CR

Gregory Aist, Ph.D. (Carnegie Mellon, 2000.)
aist at cs.cmu.edu



More information about the Corpora mailing list