[Corpora-List] software for measuring semantic similarity and relatedness?

Tue Oct 29 19:39:24 UTC 2013

I have recently assembled (under  construction) a list of all the available
lexical semantic evaluation benchmarks that people have been using in their
research. Hope people will find it useful!

http://www.cs.cmu.edu/~mfaruqui/suite.html

Manaal

On Tue, Oct 29, 2013 at 2:54 PM, Ted Pedersen <tpederse at d.umn.edu> wrote:

> Thanks to all who responded to my request for information about freely
> available packages to compute semantic similarity and relatedness
> using some sort of ontology or structured resource.
>
> Below is my best attempt at a summary - I have tried to be accurate
> here, but please if I've errored in how something is described (or
> have messed up a URL) please do let me know. And of course, if there
> are additions that should be made to this list, I'd be more than happy
> to learn of those and include both in this list and in the tutorial
> that motivated my original request. And my sincere apologies if
> someone sent me something that isn't included here - as long as there
> was an implementation that could be downloaded or accessed via the
> web, I intended to include that here (so please don't hesitate to
> remind me).
>
> I've divided the responses up into three categories.
>
> 1) packages that provide a variety of measures (and normally include
> multiple measures that were developed by someone else, and then
> implemented by the package authors perhaps along with a few of their
> own measures)
>
> 2) implementations of specific measures
>
> 3) gold standard human similarity and relatedness judgements
>
> Note that 3) wasn't included in my original request, but came about as
> a result of asking about the first two, so I thought I would include
> that information as well.
>
> ================================================
> Systems that provide a variety of measures :
> ================================================
>
> Based on WordNet and include measures based on path length, depth,
> information content, and may include relatedness measures like lesk,
> vector, hso
>
> 1) WordNet::Similarity http://wn-similarity.sourceforge.net
>
> 2) NLTK http://nltk.org
>
> 3) ws4j https://code.google.com/p/ws4j/
>
> 4) DKPro https://code.google.com/p/dkpro-similarity-asl/ (also
> includes support for Wikipedia/Wikirelate, Wiktionary, openThesaurus,
> GermaNet)
>
> Based on various medical ontologies
>
> 1) UMLS::Similarity http://umls-similarity.sourceforge.net (based on
> Unified Medical Language System)
>
> 2) Proteinon http://lasige.di.fc.ul.pt/webtools/proteinon/ (based on
> Gene Ontology)
>
> Systems where the focus may be on other issues but that still include
> some support of semantic similarity and relatedness measures between
> words/concepts
>
> 1) Disco http://www.linguatools.de/disco/disco_en.html (co-occurrence
> / corpus based similarity, but also includes plug-in for ontologies in
> Protege)
>
> 2) Semilar http://semanticsimilarity.org/ (text to text similarity but
> also includes support for word to word similarity)
>
> =================================================
> Implementations of Specific measures :
> =================================================
>
> 1) UKB http://ixa2.si.ehu.es/ukb/ (graph based similarity and
> relatedness, using WordNet)
>
> 2) http://www.cs.columbia.edu/~weiwei/code.html#wmfvec (high
> dimensional approach using definitions from WordNet/Wiktionary)
>
> 3) http://olesk.com/#SemanticRelatedness (shortest path in weighted
> semantic network)
>
>
> ==============================================================================
> Gold Standard data sets with human similarity and relatedness judgements :
>
> ==============================================================================
>
> 1) Yang and Powers 2006 Verb Similarity Scores (130 pairs)
>
> http://david.wardpowers.info/Research/AI/papers/200601-GWC-VerbSimWN.pdf
> http://david.wardpowers.info/Research/AI/papers/200601-GWC-130verbpairs.txt
>
> 2) WordSimilarity 353 Test Collection
>
> http://www.cs.technion.ac.il/~gabr/resources/data/wordsim353/
>
> http://alfonseca.org/eng/research/wordsim353.html (divided into
> similarity and relatedness pairs)
>
> 3) Rubenstein and Goodenough (65 pairs) Miller and Charles (30 pair
> subset of RG)
>
> http://www.d.umn.edu/~tpederse/Data/rubenstein-goodenough-1965.txt
>
> http://www.d.umn.edu/~tpederse/Data/miller-charles-1991.txt
>
> 4) ConceptSim (sense annotated versions of MC,RG, and WordSim 353)
>
> http://www.seas.upenn.edu/~hansens/conceptSim/
>
> 5) Medical concepts from UMLS
>
> http://rxinformatics.umn.edu/SemanticRelatednessResources.html
>
> Four different data sets, one with 101 pairs, another made up of a
> subset of 30 of those (both rated for relatedness), annother with 566
> pairs rated for similarity, and another with 587 pairs rated for
> relatedness.
>
> ========================================================================
>
> So, that's what I have at this point. Additional contributions,
> clarifications, etc. are certainly welcomed!
>
> Cordially,
> Ted
>
> On Sun, Oct 6, 2013 at 10:50 AM, Ted Pedersen <tpederse at d.umn.edu> wrote:
> > Well I managed to misspell my own URL :)
> >
> > WordNet::Similarity
> > http://wn-similarity.sourceforge.net
> >
> > All the others appear to be correct.
> >
> > On Sun, Oct 6, 2013 at 10:45 AM, Ted Pedersen <tpederse at d.umn.edu>
> wrote:
> >> Greetings all,
> >>
> >> I'm preparing a tutorial on measuring semantic similarity and
> >> relatedness between concepts, My particular focus is on methods that
> >> do this using ontologies or other (at least somewhat) structured
> >> resources (like Wikipedia, folksonomies, etc.) and that also have
> >> freely available software associated with them (or at least a web
> >> demo).
> >>
> >> While it's a very interesting area, this particular tutorial won't
> >> include purely distributional approaches (due to time constraints), so
> >> I'm looking for methods and software that use some sort of resource
> >> like WordNet, Wikipedia, medical ontologies, Freebase, etc. to arrive
> >> at measurements of semantic similarity or relatedness between pairs of
> >> concepts.
> >>
> >> What follows is my current list, based not only on projects I have
> >> heard of but have used in the not too distant past - so I guess I'm
> >> particularly interested in projects you have used or created yourself
> >> (and can therefore vouch for to some extent).
> >>
> >> Based on WordNet, provide path, depth, info content based measures,
> >> may include relatedness measures like lesk, vector, hso
> >>
> >> WordNet::Similarity
> >> http://wn-similarity.sourcforge.net
> >>
> >> NLTK
> >> http://nltk.org
> >>
> >> ws4j
> >> https://code.google.com/p/ws4j/
> >>
> >> Based on UMLS (Unified Medical Language System), provide path, depth,
> >> info content measures, includes relatedness measures lesk, vector
> >>
> >> UMLS::Similarity
> >> http://umls-similarity.sourceforge.net
> >>
> >> Based on (GO), provide path, depth, and info content measures
> >>
> >> Proteinon
> >> http://lasige.di.fc.ul.pt/webtools/proteinon/
> >>
> >> I will post a summary of whatever I hear about after some period of
> >> time. Any hints or suggestions will be very gratefully received.
> >>
> >> Many thanks,
> >> Ted
> >>
> >> --
> >> Ted Pedersen
> >> http://www.d.umn.edu/~tpederse
> >
> >
> >
> > --
> > Ted Pedersen
> > http://www.d.umn.edu/~tpederse
>
>
>
> --
> Ted Pedersen
> http://www.d.umn.edu/~tpederse
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20131029/cd27433d/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora