[Corpora-List] software for measuring semantic similarity and relatedness?

Juan Fernández Fernández jnferfer at gmail.com
Mon Oct 7 14:27:06 UTC 2013


Hello,

I am also interested in the topic, but in a simpler way. I would like  
to measure word-based (not sense-based) similarity - that is,  
sentences that share the same words (lemmas), excluding stopwords. As  
I need to preprocess twitter sentiment corpora, I was wondering if  
there are tools to detect word similarity, as in spam or repetitive  
twitter messages. Does anybody know anything for Spanish?

Thank you very much,

Juan F.



El 07/10/2013, a las 15:44, Eneko Agirre escribió:

>
>
> Hi Ted and all,
>
> you might want to check http://ixa2.si.ehu.es/ukb/, a graph-based  
> algorithm for WSD and similarity,which uses random walks. It scores  
> very high in RG65 and WordSim353 when run on WordNet, and can be  
> applied to any KB.
>
> It's open source and includes all data necessary to replicate the  
> results reported in the following:
>
> [3] Eneko Agirre, Enrique Alfonseca, Keith Hall, Jana Kravalova,  
> Marius Pasca and Aitor Soroa. 2009. A Study on Similarity and  
> Relatedness Using Distributional and WordNet-based Approaches.  
> Proceedings of NAACL-HLT 09. Boulder, USA.  (PDF)
>
> [4] Eneko Agirre, Montse Cuadros, German Rigau and Aitor Soroa.  
> 2010.  Exploring Knowledge Bases for Similarity. Proceedings of LREC  
> 2010. Valletta, Malta.  (PDF)
>
> best
>
> eneko
>
>
>
> 10/06/2013 05:45 PM(e)an, Ted Pedersen(e)k idatzi zuen:
>> Greetings all,
>>
>> I'm preparing a tutorial on measuring semantic similarity and
>> relatedness between concepts, My particular focus is on methods that
>> do this using ontologies or other (at least somewhat) structured
>> resources (like Wikipedia, folksonomies, etc.) and that also have
>> freely available software associated with them (or at least a web
>> demo).
>>
>> While it's a very interesting area, this particular tutorial won't
>> include purely distributional approaches (due to time constraints),  
>> so
>> I'm looking for methods and software that use some sort of resource
>> like WordNet, Wikipedia, medical ontologies, Freebase, etc. to arrive
>> at measurements of semantic similarity or relatedness between pairs  
>> of
>> concepts.
>>
>> What follows is my current list, based not only on projects I have
>> heard of but have used in the not too distant past - so I guess I'm
>> particularly interested in projects you have used or created yourself
>> (and can therefore vouch for to some extent).
>>
>> Based on WordNet, provide path, depth, info content based measures,
>> may include relatedness measures like lesk, vector, hso
>>
>> WordNet::Similarity
>> http://wn-similarity.sourcforge.net
>>
>> NLTK
>> http://nltk.org
>>
>> ws4j
>> https://code.google.com/p/ws4j/
>>
>> Based on UMLS (Unified Medical Language System), provide path, depth,
>> info content measures, includes relatedness measures lesk, vector
>>
>> UMLS::Similarity
>> http://umls-similarity.sourceforge.net
>>
>> Based on (GO), provide path, depth, and info content measures
>>
>> Proteinon
>> http://lasige.di.fc.ul.pt/webtools/proteinon/
>>
>> I will post a summary of whatever I hear about after some period of
>> time. Any hints or suggestions will be very gratefully received.
>>
>> Many thanks,
>> Ted
>>
>
>
> -- 
>
> Eneko Agirre
> Euskal Herriko Unibertsitatea
> University of the Basque Country
> http://ixa2.si.ehu.es/eneko
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20131007/35feac0e/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list