Corpora: approximations (bounds) for edit distance

Computer Researcher compresearch at hotmail.com
Thu Nov 29 18:43:25 UTC 2001


Hi,

Does anyone know good approximations (lower and/or upper bounds) to edit
distance? (by using some statistical numbers that can be found by
preprocessing of the strings)

In the preprocess time we can transform the strings to a bunch of numbers
(e.g., multi-dimensional vectors); and then use these vectors to approximate
the edit distance between strings.

I found a paper by Hadlock, F. (1988), proposing a "lower bound" by using
frequencies of the letters in the string. Assuming that the alphabet is same
for all strings, all frequency vectors will have same number of dimensions.
And he defines a distance metric over these vectors, so that this distance
(in the vector space) is a lower-bound to the actual edit distance.

Do you know any other method that can achieve a similar goal?

Thanks for your attention,

CR

_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp



More information about the Corpora mailing list