[Corpora-List] software semantic similarity between texts

Dom Widdows widdows at google.com
Mon Oct 20 14:43:26 UTC 2008


Hi Antonio,

An option I'd like to add to Scott's list is the Semantic Vectors
package (http://semanticvectors.googlecode.com). The package is quite
stable, and judging by the reasonably frequent feedback and questions
we get on the mailing list, users have found it pretty easy to get
started with.

Semantic Vectors uses random projection which scales much better than
some of the other matrix factorization techniques used in latent
semantic analysis - there is some evidence that for small corpora, the
more traditional singular value decomposition gives more accurate
results, though I think there is much yet to be learned in this area.
It should also be relatively easy to add singular value decomposition
as an option, though I haven't done this yet - if you want to use SVD,
you could also try the older Infomap package at
http://infomap-nlp.sourceforge.net/.

Best wishes,
Dominic

On Mon, Oct 20, 2008 at 10:00 AM, Scott A. Crossley
<sacrossley at gmail.com> wrote:
> Latent Semantic Analysis should do the trick. There are a variety of tools
> on the website that should help you out.
>
> http://lsa.colorado.edu/
>
> Scott Crossley, Ph.D.
> Linguistics/TESOL
>
> Department of English
> Mississippi State University
> http://www.msstate.edu/dept/english/tesol/tesolfaculty.html
> (662) 325-2355
>
> Institute for Intelligent Systems
> University of Memphis
> http://mnemosyne.csl.psyc.memphis.edu/iis/
>
>
> -----Original Message-----
> From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of
> Antonio Toral
> Sent: Monday, October 20, 2008 7:51 AM
> To: corpora at uib.no
> Subject: [Corpora-List] software semantic similarity between texts
>
> Dear Corpora members,
>
> I'm looking for some software that computes semantic similarity between
> small
> texts (e.g. wordnet glosses, dictionary definitions). I am aware of
> simFinder
> but it seems that is not available anymore. Does anyone know about any
> available software to do this?
>
> Thanks!
>
> Regards,
> --
> Antonio Toral
>
> Istituto di Linguistica Computazionale
> Consiglio Nazionale delle Ricerche
> Area della Ricerca di Pisa
>
> http://www.dlsi.ua.es/~atoral/
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list