Corpora: approximations (bounds) for edit distance

Bruce L. Lambert, Ph.D. lambertb at uic.edu
Fri Nov 30 18:42:54 UTC 2001


Maybe I'm missing something, but the upper bound on edit distance between
two strings is always the length of the longer string, and the lower bound
is always zero (when the strings are identical).

-bruce


At 06:43 PM 11/29/01 +0000, Computer Researcher wrote:
>Hi,
>
>Does anyone know good approximations (lower and/or upper bounds) to edit
>distance? (by using some statistical numbers that can be found by
>preprocessing of the strings)
>
>In the preprocess time we can transform the strings to a bunch of numbers
>(e.g., multi-dimensional vectors); and then use these vectors to
>approximate the edit distance between strings.
>
>I found a paper by Hadlock, F. (1988), proposing a "lower bound" by using
>frequencies of the letters in the string. Assuming that the alphabet is
>same for all strings, all frequency vectors will have same number of
>dimensions. And he defines a distance metric over these vectors, so that
>this distance (in the vector space) is a lower-bound to the actual edit
>distance.
>
>Do you know any other method that can achieve a similar goal?
>
>Thanks for your attention,
>
>CR
>
>_________________________________________________________________
>Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp
>



More information about the Corpora mailing list