<html><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div><br></div><div>Hello,</div><div><br></div><div>I am also interested in the topic, but in a simpler way. I would like to measure word-based (not sense-based) similarity - that is, sentences that share the same words (lemmas), excluding stopwords. As I need to preprocess twitter sentiment corpora, I was wondering if there are tools to detect word similarity, as in spam or repetitive twitter messages. Does anybody know anything for Spanish?</div><div><br></div><div>Thank you very much,</div><div><br></div><div>Juan F.</div><div><br></div><div><br></div><br><div><div>El 07/10/2013, a las 15:44, Eneko Agirre escribió:</div><br class="Apple-interchange-newline"><blockquote type="cite">      <meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type">    <div bgcolor="#FFFFFF" text="#000000">    <div class="moz-cite-prefix"><br>      <br>      Hi Ted and all,<br>      <br>      you might want to check      <meta http-equiv="content-type" content="text/html;        charset=ISO-8859-1">      <a href="http://ixa2.si.ehu.es/ukb/">http://ixa2.si.ehu.es/ukb/</a>,      a graph-based algorithm for WSD and similarity,which uses random      walks. It scores very high in RG65 and WordSim353 when run on      WordNet, and can be applied to any KB.<br>      <br>      It's open source and includes all data necessary to replicate the      results reported in the following:<br>      <br style="color: rgb(0, 0, 0); font-family: 'Times New Roman';        font-size: medium; font-style: normal; font-variant: normal;        font-weight: normal; letter-spacing: normal; line-height:        normal; orphans: auto; text-align: start; text-indent: 0px;        text-transform: none; white-space: normal; widows: auto;        word-spacing: 0px; -webkit-text-stroke-width: 0px;">      <span style="color: rgb(0, 0, 0); font-family: 'Times New Roman';        font-size: medium; font-style: normal; font-variant: normal;        font-weight: normal; letter-spacing: normal; line-height:        normal; orphans: auto; text-align: start; text-indent: 0px;        text-transform: none; white-space: normal; widows: auto;        word-spacing: 0px; -webkit-text-stroke-width: 0px; display:        inline !important; float: none;">[3] Eneko Agirre, Enrique        Alfonseca, Keith Hall, Jana Kravalova, Marius Pasca and Aitor        Soroa. 2009. A Study on Similarity and Relatedness Using        Distributional and WordNet-based Approaches. Proceedings of        NAACL-HLT 09. Boulder, USA.  (</span><a href="https://ixa.si.ehu.es/Ixa/Argitalpenak/Artikuluak/1239169991/publikoak/2009-naacl-long.pdf" style="font-family: 'Times New Roman'; font-size: medium;        font-style: normal; font-variant: normal; font-weight: normal;        letter-spacing: normal; line-height: normal; orphans: auto;        text-align: start; text-indent: 0px; text-transform: none;        white-space: normal; widows: auto; word-spacing: 0px;        -webkit-text-stroke-width: 0px;">PDF</a><span style="color:        rgb(0, 0, 0); font-family: 'Times New Roman'; font-size: medium;        font-style: normal; font-variant: normal; font-weight: normal;        letter-spacing: normal; line-height: normal; orphans: auto;        text-align: start; text-indent: 0px; text-transform: none;        white-space: normal; widows: auto; word-spacing: 0px;        -webkit-text-stroke-width: 0px; display: inline !important;        float: none;">)</span><br style="color: rgb(0, 0, 0);        font-family: 'Times New Roman'; font-size: medium; font-style:        normal; font-variant: normal; font-weight: normal;        letter-spacing: normal; line-height: normal; orphans: auto;        text-align: start; text-indent: 0px; text-transform: none;        white-space: normal; widows: auto; word-spacing: 0px;        -webkit-text-stroke-width: 0px;">      <br style="color: rgb(0, 0, 0); font-family: 'Times New Roman';        font-size: medium; font-style: normal; font-variant: normal;        font-weight: normal; letter-spacing: normal; line-height:        normal; orphans: auto; text-align: start; text-indent: 0px;        text-transform: none; white-space: normal; widows: auto;        word-spacing: 0px; -webkit-text-stroke-width: 0px;">      <span style="color: rgb(0, 0, 0); font-family: 'Times New Roman';        font-size: medium; font-style: normal; font-variant: normal;        font-weight: normal; letter-spacing: normal; line-height:        normal; orphans: auto; text-align: start; text-indent: 0px;        text-transform: none; white-space: normal; widows: auto;        word-spacing: 0px; -webkit-text-stroke-width: 0px; display:        inline !important; float: none;">[4] Eneko Agirre, Montse        Cuadros, German Rigau and Aitor Soroa. 2010.  Exploring        Knowledge Bases for Similarity. Proceedings of LREC 2010.        Valletta, Malta.  (</span><a href="http://ixa.si.ehu.es/Ixa/Argitalpenak/Artikuluak/1274099085/publikoak/main.pdf" style="font-family: 'Times New Roman'; font-size: medium;        font-style: normal; font-variant: normal; font-weight: normal;        letter-spacing: normal; line-height: normal; orphans: auto;        text-align: start; text-indent: 0px; text-transform: none;        white-space: normal; widows: auto; word-spacing: 0px;        -webkit-text-stroke-width: 0px;">PDF</a><span style="color:        rgb(0, 0, 0); font-family: 'Times New Roman'; font-size: medium;        font-style: normal; font-variant: normal; font-weight: normal;        letter-spacing: normal; line-height: normal; orphans: auto;        text-align: start; text-indent: 0px; text-transform: none;        white-space: normal; widows: auto; word-spacing: 0px;        -webkit-text-stroke-width: 0px; display: inline !important;        float: none;">)</span><br style="color: rgb(0, 0, 0);        font-family: 'Times New Roman'; font-size: medium; font-style:        normal; font-variant: normal; font-weight: normal;        letter-spacing: normal; line-height: normal; orphans: auto;        text-align: start; text-indent: 0px; text-transform: none;        white-space: normal; widows: auto; word-spacing: 0px;        -webkit-text-stroke-width: 0px;">      <br>      best<br>      <br>      eneko<br>      <br>      <br>      <br>      10/06/2013 05:45 PM(e)an, Ted Pedersen(e)k idatzi zuen:<br>    </div>    <blockquote cite="mid:CAAfu72_ft9fYxUxZbJBud8r5sDtPXDJETMiyfwvnQ8to5v-rOg@mail.gmail.com" type="cite">      <pre wrap="">Greetings all,

I'm preparing a tutorial on measuring semantic similarity and

relatedness between concepts, My particular focus is on methods that

do this using ontologies or other (at least somewhat) structured

resources (like Wikipedia, folksonomies, etc.) and that also have

freely available software associated with them (or at least a web

demo).

While it's a very interesting area, this particular tutorial won't

include purely distributional approaches (due to time constraints), so

I'm looking for methods and software that use some sort of resource

like WordNet, Wikipedia, medical ontologies, Freebase, etc. to arrive

at measurements of semantic similarity or relatedness between pairs of

concepts.

What follows is my current list, based not only on projects I have

heard of but have used in the not too distant past - so I guess I'm

particularly interested in projects you have used or created yourself

(and can therefore vouch for to some extent).

Based on WordNet, provide path, depth, info content based measures,

may include relatedness measures like lesk, vector, hso

WordNet::Similarity

<a class="moz-txt-link-freetext" href="http://wn-similarity.sourcforge.net">http://wn-similarity.sourcforge.net</a>

NLTK

<a class="moz-txt-link-freetext" href="http://nltk.org">http://nltk.org</a>

ws4j

<a class="moz-txt-link-freetext" href="https://code.google.com/p/ws4j/">https://code.google.com/p/ws4j/</a>

Based on UMLS (Unified Medical Language System), provide path, depth,

info content measures, includes relatedness measures lesk, vector

UMLS::Similarity

<a class="moz-txt-link-freetext" href="http://umls-similarity.sourceforge.net">http://umls-similarity.sourceforge.net</a>

Based on (GO), provide path, depth, and info content measures

Proteinon

<a class="moz-txt-link-freetext" href="http://lasige.di.fc.ul.pt/webtools/proteinon/">http://lasige.di.fc.ul.pt/webtools/proteinon/</a>

I will post a summary of whatever I hear about after some period of

time. Any hints or suggestions will be very gratefully received.

Many thanks,

Ted

</pre>    </blockquote>    <br>    <br>    <pre class="moz-signature" cols="72">-- 

Eneko Agirre

Euskal Herriko Unibertsitatea

University of the Basque Country

<a class="moz-txt-link-freetext" href="http://ixa2.si.ehu.es/eneko">http://ixa2.si.ehu.es/eneko</a> </pre>  </div>  _______________________________________________<br>UNSUBSCRIBE from this page: <a href="http://mailman.uib.no/options/corpora">http://mailman.uib.no/options/corpora</a><br>Corpora mailing list<br><a href="mailto:Corpora@uib.no">Corpora@uib.no</a><br>http://mailman.uib.no/listinfo/corpora<br></blockquote></div><br></body></html>