[Corpora-List] Faster tool for WordNet Similarity measures

Eneko Agirre e.agirre at ehu.es
Wed Feb 2 14:03:15 UTC 2011


Hi Suzan, all,

another option is to use UKB for word similarity / relatedness (http://ixa2.si.ehu.es/ukb/). It's based on random walks over knowledge base graphs, and it has produced the best WordNet-based results 
on the 353sim dataset to date (as reported in several papers which you can check in the website). The random walk software is programmed in C++. The similarity / relatedness in Perl.

The random walks are the most costly part of the process, so we have computed random walks for all WordNet lemmas (available in the website, 1.2 G), and thus the similarity/relatedness algorithm just 
needs to do a vector comparison. To improve speed further, the precomputed vectors contain 1000 components (instead of the ca. 120000 in the full WordNet graph). The results on the 353sim dataset 
using 1000 components or the full vectors where nearly identical.

best

eneko

> Date: Tue, 1 Feb 2011 10:25:23 +0100
> From: Suzan Verberne<s.verberne at let.ru.nl>
> Subject: [Corpora-List] Faster tool for WordNet Similarity measures
> To: Corpora List<corpora at uib.no>
>
> Hi all,
>
> I have previously been using Pedersen's WordNet Similarity module (
> http://search.cpan.org/dist/WordNet-Similarity/lib/WordNet/Similarity.pm
> ) for calculating the similarity or relatedness between pairs of
> words. Now I started to use it again but I noticed that it is way too
> slow for a real-time application (which is what I need now).
>
> I originally wrote a simple Perl script that calls the module (shown
> below) but it takes almost five seconds to run. Almost all this time
> is spent on calling the module so for batch scripts it is fine (then
> the module is only called once for multiple requests), but I need it
> to work in real time in a retrieval experiment and then 5 seconds is
> too long.
>
> Does anyone know an alternative (fast!) tool for calculating
> Similarity and/or Relatedness between two words? It might be using
> either a Wu&  Palmer-like measure or a Lesk-type measure.
>
> Thanks!
> Suzan Verberne
>
> #! /usr/bin/perl
>   use WordNet::QueryData;
>   use WordNet::Similarity::path;
>   my $wn = WordNet::QueryData->new;
>   my $measure = WordNet::Similarity::path->new ($wn);
>   my $value = $measure->getRelatedness("car#n#1", "bus#n#2");
>   print "car (sense 1)<->  bus (sense 2) = $value\n";
>
>
> -- 
> Suzan Verberne, postdoctoral researcher
> Centre for Language and Speech Technology
> Radboud University Nijmegen
> Tel: +31 24 3611134
> Email: s.verberne at let.ru.nl
> http://lands.let.ru.nl/~sverbern/
> --
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora


-- 
----------- NEW URL: http://ixa2.si.ehu.es/eneko ------------

Eneko Agirre                                                .
Informatika Fakultatea                mailto: e.agirre at ehu.es
Manuel Lardizabal, 1                                        .
20.018 Donostia                         fax: (+34) 943 015590
Basque Country (via Spain)              tel: (+34) 943 015019


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list