[Corpora-List] Semantic Distances Revisited

Mon Dec 2 04:09:18 UTC 2002

>>It's great stuff, although it's taxonomy-based.
>>I was specifically interested in distributional methods.

>  And what is the difference - if it is possible to answer?

I'll give it a try -- apologies in advance to more-experienced list members
pained by my explanation.

In a taxonomy, items are typically represented as nodes of a tree. So when
you're measuring how similar two items are, you find them both on the tree,
and then calculate how close they are to each other. (There are different
ways to do this, and that's where the Hirst and Budanitsky article comes
in.)
It's a great approach, if you have the taxonomy already built for you.
The pitfalls of making a taxonomy are well-known: it's a lot of work, your
taxonomy may not hold across languages, and it's hard not to let your
taxonomy reflect your biases.

Distribution-based methods don't use a taxonomy; they attempt to find
similarity based on the surrounding words. Again, there are many ways to do
this, but the underlying assumption is that words that appear in similar
contexts are similar to each other. E.g. Beth Levin in her work with
English verb classes, makes the striking assertion that verbs that exhibit
similar syntactic behaviour are semantically related. Quite a revelation
for a linguist such as myself -- linguists have traditionally studied
syntax, while putting semantics in the "too-hard" basket. This work showed
that syntax can be a key to semantics.

That's a really basic overview.
Phil Resnik gives a thorough review of both kinds of methods in his
dissertation. You can find it at:
http://citeseer.nj.nec.com/resnik93selection.html
His Lexical Acquisition talk at ACL 2002 changed my life. And may I add,
he's one heck of a dancer.

Feedback welcome.
Daniel

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Daniel Midgley
dmidgley at arts.uwa.edu.au
+ (61 8) 9371 3730
http://www.cs.uwa.edu.au/~fontor