[Corpora-List] Request for help concerning a LSA problem

Evgeniy Gabrilovich gabr at cs.technion.ac.il
Fri May 5 14:59:08 UTC 2006


Dear Cecilie D. Widsteen,

I'm not familiar with the Jama Matrix Package, but recently
I conducted a search for existing implementations of LSA,
so I thought you might find these useful:

1) Text to Matrix Generator (TMG) - Matlab toolbox
   http://scgroup.hpclab.ceid.upatras.gr/scgroup/Projects/TMG/
2) A package for the R Project for Statistical Computing
   http://cran.r-project.org/src/contrib/Descriptions/lsa.html
3) General Text Parser (GTP) - C++ code
   http://www.cs.utk.edu/~lsi/gtp-request.html
4) Links to additional LSA-related software are available at
   http://www.cs.utk.edu/~lsi/soft.html

Regards,

Evgeniy.

--
Evgeniy Gabrilovich
Ph.D. student in Computer Science
Department of Computer Science, Technion - Israel Institute of Technology
Technion City, Haifa 32000, Israel
Email: gabr at cs.technion.ac.il WWW: http://www.cs.technion.ac.il/~gabr
Phone: +972-4-8294948
 

> -----Original Message-----
> From: owner-corpora at lists.uib.no 
> [mailto:owner-corpora at lists.uib.no] On Behalf Of Cecilie 
> Desiree Widsteen
> Sent: Thursday, May 04, 2006 10:29
> To: corpora at uib.no
> Subject: [Corpora-List] Request for help concerning a LSA problem
> 
> Hello all,
> 
> I´m currently trying to implement Latent Semantic Analysis, as part of
> an automatic classification system. I´m programming in Java, and using
> the Jama Matrix package for the matrix stuff. I have stumbled 
> over some
> strange problems, and would be grateful if anyone on this list  could
> offer some help.
> My problem is: I have implemented a class which takes care of 
> building a
> matrix representation of a corpus, and performs SVD over the
> term-by-document matrix. Most of the operations are done by the Jama
> class "Matrix".  This works fine, except for the fact that when I ran
> the program over various small test corpora (like, for 
> instance, the one
> from Chapter 15 in Schütze and Manning´s book Foundations of 
> Statistical
> NLP) most of the righ and left singular vectors contained the correct
> values but with wrong/reversed sign?! E.g. a vector that 
> should have the
> values [-0.75,-0.28,-0.20, ...] are assigned the values [0.75,0.28,
> ...]. Unfortunately, I have limited experience with linear algebra and
> the like so now I  find myself completely at loss in debugging this...
> As far as I can understand, this means that my vectors are pointing in
> the opposite direction from the one they should, but why this 
> is escapes
> my understanding :)
> Any help, hints, tricks and the like are extremely welcome! I can also
> send over the source code on request.
> 
> Regards,
> --
> Cecilie D. Widsteen
> Department of Linguistics
> University of Oslo
> 
> 
> 



More information about the Corpora mailing list