<div dir="ltr">I have recently assembled (under  construction) a list of all the available lexical semantic evaluation benchmarks that people have been using in their research. Hope people will find it useful!<div><br></div>


<div><a href="http://www.cs.cmu.edu/~mfaruqui/suite.html" target="_blank">http://www.cs.cmu.edu/~mfaruqui/suite.html</a><br>

</div><div><br></div><div>Manaal</div><div class="gmail_extra"><br><br><div class="gmail_quote">On Tue, Oct 29, 2013 at 2:54 PM, Ted Pedersen <span dir="ltr"><<a href="mailto:tpederse@d.umn.edu" target="_blank">tpederse@d.umn.edu</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Thanks to all who responded to my request for information about freely<br>

available packages to compute semantic similarity and relatedness<br>

using some sort of ontology or structured resource.<br>

<br>

Below is my best attempt at a summary - I have tried to be accurate<br>

here, but please if I've errored in how something is described (or<br>

have messed up a URL) please do let me know. And of course, if there<br>

are additions that should be made to this list, I'd be more than happy<br>

to learn of those and include both in this list and in the tutorial<br>

that motivated my original request. And my sincere apologies if<br>

someone sent me something that isn't included here - as long as there<br>

was an implementation that could be downloaded or accessed via the<br>

web, I intended to include that here (so please don't hesitate to<br>

remind me).<br>

<br>

I've divided the responses up into three categories.<br>

<br>

1) packages that provide a variety of measures (and normally include<br>

multiple measures that were developed by someone else, and then<br>

implemented by the package authors perhaps along with a few of their<br>

own measures)<br>

<br>

2) implementations of specific measures<br>

<br>

3) gold standard human similarity and relatedness judgements<br>

<br>

Note that 3) wasn't included in my original request, but came about as<br>

a result of asking about the first two, so I thought I would include<br>

that information as well.<br>

<br>

================================================<br>

Systems that provide a variety of measures :<br>

================================================<br>

<br>

Based on WordNet and include measures based on path length, depth,<br>

information content, and may include relatedness measures like lesk,<br>

vector, hso<br>

<br>

1) WordNet::Similarity <a href="http://wn-similarity.sourceforge.net" target="_blank">http://wn-similarity.sourceforge.net</a><br>

<br>

2) NLTK <a href="http://nltk.org" target="_blank">http://nltk.org</a><br>

<br>

3) ws4j <a href="https://code.google.com/p/ws4j/" target="_blank">https://code.google.com/p/ws4j/</a><br>

<br>

4) DKPro <a href="https://code.google.com/p/dkpro-similarity-asl/" target="_blank">https://code.google.com/p/dkpro-similarity-asl/</a> (also<br>

includes support for Wikipedia/Wikirelate, Wiktionary, openThesaurus,<br>

GermaNet)<br>

<br>

Based on various medical ontologies<br>

<br>

1) UMLS::Similarity <a href="http://umls-similarity.sourceforge.net" target="_blank">http://umls-similarity.sourceforge.net</a> (based on<br>

Unified Medical Language System)<br>

<br>

2) Proteinon <a href="http://lasige.di.fc.ul.pt/webtools/proteinon/" target="_blank">http://lasige.di.fc.ul.pt/webtools/proteinon/</a> (based on<br>

Gene Ontology)<br>

<br>

Systems where the focus may be on other issues but that still include<br>

some support of semantic similarity and relatedness measures between<br>

words/concepts<br>

<br>

1) Disco <a href="http://www.linguatools.de/disco/disco_en.html" target="_blank">http://www.linguatools.de/disco/disco_en.html</a> (co-occurrence<br>

/ corpus based similarity, but also includes plug-in for ontologies in<br>

Protege)<br>

<br>

2) Semilar <a href="http://semanticsimilarity.org/" target="_blank">http://semanticsimilarity.org/</a> (text to text similarity but<br>

also includes support for word to word similarity)<br>

<br>

=================================================<br>

Implementations of Specific measures :<br>

=================================================<br>

<br>

1) UKB <a href="http://ixa2.si.ehu.es/ukb/" target="_blank">http://ixa2.si.ehu.es/ukb/</a> (graph based similarity and<br>

relatedness, using WordNet)<br>

<br>

2) <a href="http://www.cs.columbia.edu/~weiwei/code.html#wmfvec" target="_blank">http://www.cs.columbia.edu/~weiwei/code.html#wmfvec</a> (high<br>

dimensional approach using definitions from WordNet/Wiktionary)<br>

<br>

3) <a href="http://olesk.com/#SemanticRelatedness" target="_blank">http://olesk.com/#SemanticRelatedness</a> (shortest path in weighted<br>

semantic network)<br>

<br>

==============================================================================<br>

Gold Standard data sets with human similarity and relatedness judgements :<br>

==============================================================================<br>

<br>

1) Yang and Powers 2006 Verb Similarity Scores (130 pairs)<br>

<br>

<a href="http://david.wardpowers.info/Research/AI/papers/200601-GWC-VerbSimWN.pdf" target="_blank">http://david.wardpowers.info/Research/AI/papers/200601-GWC-VerbSimWN.pdf</a><br>

<a href="http://david.wardpowers.info/Research/AI/papers/200601-GWC-130verbpairs.txt" target="_blank">http://david.wardpowers.info/Research/AI/papers/200601-GWC-130verbpairs.txt</a><br>

<br>

2) WordSimilarity 353 Test Collection<br>

<br>

<a href="http://www.cs.technion.ac.il/~gabr/resources/data/wordsim353/" target="_blank">http://www.cs.technion.ac.il/~gabr/resources/data/wordsim353/</a><br>

<br>

<a href="http://alfonseca.org/eng/research/wordsim353.html" target="_blank">http://alfonseca.org/eng/research/wordsim353.html</a> (divided into<br>

similarity and relatedness pairs)<br>

<br>

3) Rubenstein and Goodenough (65 pairs) Miller and Charles (30 pair<br>

subset of RG)<br>

<br>

<a href="http://www.d.umn.edu/~tpederse/Data/rubenstein-goodenough-1965.txt" target="_blank">http://www.d.umn.edu/~tpederse/Data/rubenstein-goodenough-1965.txt</a><br>

<br>

<a href="http://www.d.umn.edu/~tpederse/Data/miller-charles-1991.txt" target="_blank">http://www.d.umn.edu/~tpederse/Data/miller-charles-1991.txt</a><br>

<br>

4) ConceptSim (sense annotated versions of MC,RG, and WordSim 353)<br>

<br>

<a href="http://www.seas.upenn.edu/~hansens/conceptSim/" target="_blank">http://www.seas.upenn.edu/~hansens/conceptSim/</a><br>

<br>

5) Medical concepts from UMLS<br>

<br>

<a href="http://rxinformatics.umn.edu/SemanticRelatednessResources.html" target="_blank">http://rxinformatics.umn.edu/SemanticRelatednessResources.html</a><br>

<br>

Four different data sets, one with 101 pairs, another made up of a<br>

subset of 30 of those (both rated for relatedness), annother with 566<br>

pairs rated for similarity, and another with 587 pairs rated for<br>

relatedness.<br>

<br>

========================================================================<br>

<br>

So, that's what I have at this point. Additional contributions,<br>

clarifications, etc. are certainly welcomed!<br>

<br>

Cordially,<br>

Ted<br>

<div><div><br>

On Sun, Oct 6, 2013 at 10:50 AM, Ted Pedersen <<a href="mailto:tpederse@d.umn.edu" target="_blank">tpederse@d.umn.edu</a>> wrote:<br>

> Well I managed to misspell my own URL :)<br>

><br>

> WordNet::Similarity<br>

> <a href="http://wn-similarity.sourceforge.net" target="_blank">http://wn-similarity.sourceforge.net</a><br>

><br>

> All the others appear to be correct.<br>

><br>

> On Sun, Oct 6, 2013 at 10:45 AM, Ted Pedersen <<a href="mailto:tpederse@d.umn.edu" target="_blank">tpederse@d.umn.edu</a>> wrote:<br>

>> Greetings all,<br>

>><br>

>> I'm preparing a tutorial on measuring semantic similarity and<br>

>> relatedness between concepts, My particular focus is on methods that<br>

>> do this using ontologies or other (at least somewhat) structured<br>

>> resources (like Wikipedia, folksonomies, etc.) and that also have<br>

>> freely available software associated with them (or at least a web<br>

>> demo).<br>

>><br>

>> While it's a very interesting area, this particular tutorial won't<br>

>> include purely distributional approaches (due to time constraints), so<br>

>> I'm looking for methods and software that use some sort of resource<br>

>> like WordNet, Wikipedia, medical ontologies, Freebase, etc. to arrive<br>

>> at measurements of semantic similarity or relatedness between pairs of<br>

>> concepts.<br>

>><br>

>> What follows is my current list, based not only on projects I have<br>

>> heard of but have used in the not too distant past - so I guess I'm<br>

>> particularly interested in projects you have used or created yourself<br>

>> (and can therefore vouch for to some extent).<br>

>><br>

>> Based on WordNet, provide path, depth, info content based measures,<br>

>> may include relatedness measures like lesk, vector, hso<br>

>><br>

>> WordNet::Similarity<br>

>> <a href="http://wn-similarity.sourcforge.net" target="_blank">http://wn-similarity.sourcforge.net</a><br>

>><br>

>> NLTK<br>

>> <a href="http://nltk.org" target="_blank">http://nltk.org</a><br>

>><br>

>> ws4j<br>

>> <a href="https://code.google.com/p/ws4j/" target="_blank">https://code.google.com/p/ws4j/</a><br>

>><br>

>> Based on UMLS (Unified Medical Language System), provide path, depth,<br>

>> info content measures, includes relatedness measures lesk, vector<br>

>><br>

>> UMLS::Similarity<br>

>> <a href="http://umls-similarity.sourceforge.net" target="_blank">http://umls-similarity.sourceforge.net</a><br>

>><br>

>> Based on (GO), provide path, depth, and info content measures<br>

>><br>

>> Proteinon<br>

>> <a href="http://lasige.di.fc.ul.pt/webtools/proteinon/" target="_blank">http://lasige.di.fc.ul.pt/webtools/proteinon/</a><br>

>><br>

>> I will post a summary of whatever I hear about after some period of<br>

>> time. Any hints or suggestions will be very gratefully received.<br>

>><br>

>> Many thanks,<br>

>> Ted<br>

>><br>

>> --<br>

>> Ted Pedersen<br>

>> <a href="http://www.d.umn.edu/~tpederse" target="_blank">http://www.d.umn.edu/~tpederse</a><br>

><br>

><br>

><br>

> --<br>

> Ted Pedersen<br>

> <a href="http://www.d.umn.edu/~tpederse" target="_blank">http://www.d.umn.edu/~tpederse</a><br>

<br>

<br>

<br>

--<br>

Ted Pedersen<br>

<a href="http://www.d.umn.edu/~tpederse" target="_blank">http://www.d.umn.edu/~tpederse</a><br>

<br>

_______________________________________________<br>

UNSUBSCRIBE from this page: <a href="http://mailman.uib.no/options/corpora" target="_blank">http://mailman.uib.no/options/corpora</a><br>

Corpora mailing list<br>

<a href="mailto:Corpora@uib.no" target="_blank">Corpora@uib.no</a><br>

<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/listinfo/corpora</a><br>

</div></div></blockquote></div><br></div></div>