[Corpora-List] Faster tool for WordNet Similarity measures
Eneko Agirre
e.agirre at ehu.es
Wed Feb 2 14:03:15 UTC 2011
Hi Suzan, all,
another option is to use UKB for word similarity / relatedness (http://ixa2.si.ehu.es/ukb/). It's based on random walks over knowledge base graphs, and it has produced the best WordNet-based results
on the 353sim dataset to date (as reported in several papers which you can check in the website). The random walk software is programmed in C++. The similarity / relatedness in Perl.
The random walks are the most costly part of the process, so we have computed random walks for all WordNet lemmas (available in the website, 1.2 G), and thus the similarity/relatedness algorithm just
needs to do a vector comparison. To improve speed further, the precomputed vectors contain 1000 components (instead of the ca. 120000 in the full WordNet graph). The results on the 353sim dataset
using 1000 components or the full vectors where nearly identical.
best
eneko
> Date: Tue, 1 Feb 2011 10:25:23 +0100
> From: Suzan Verberne<s.verberne at let.ru.nl>
> Subject: [Corpora-List] Faster tool for WordNet Similarity measures
> To: Corpora List<corpora at uib.no>
>
> Hi all,
>
> I have previously been using Pedersen's WordNet Similarity module (
> http://search.cpan.org/dist/WordNet-Similarity/lib/WordNet/Similarity.pm
> ) for calculating the similarity or relatedness between pairs of
> words. Now I started to use it again but I noticed that it is way too
> slow for a real-time application (which is what I need now).
>
> I originally wrote a simple Perl script that calls the module (shown
> below) but it takes almost five seconds to run. Almost all this time
> is spent on calling the module so for batch scripts it is fine (then
> the module is only called once for multiple requests), but I need it
> to work in real time in a retrieval experiment and then 5 seconds is
> too long.
>
> Does anyone know an alternative (fast!) tool for calculating
> Similarity and/or Relatedness between two words? It might be using
> either a Wu& Palmer-like measure or a Lesk-type measure.
>
> Thanks!
> Suzan Verberne
>
> #! /usr/bin/perl
> use WordNet::QueryData;
> use WordNet::Similarity::path;
> my $wn = WordNet::QueryData->new;
> my $measure = WordNet::Similarity::path->new ($wn);
> my $value = $measure->getRelatedness("car#n#1", "bus#n#2");
> print "car (sense 1)<-> bus (sense 2) = $value\n";
>
>
> --
> Suzan Verberne, postdoctoral researcher
> Centre for Language and Speech Technology
> Radboud University Nijmegen
> Tel: +31 24 3611134
> Email: s.verberne at let.ru.nl
> http://lands.let.ru.nl/~sverbern/
> --
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
--
----------- NEW URL: http://ixa2.si.ehu.es/eneko ------------
Eneko Agirre .
Informatika Fakultatea mailto: e.agirre at ehu.es
Manuel Lardizabal, 1 .
20.018 Donostia fax: (+34) 943 015590
Basque Country (via Spain) tel: (+34) 943 015019
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list