[Corpora-List] Request for help concerning a LSA problem

Cecilie Desiree Widsteen cecilidw at student.iln.uio.no
Thu May 4 08:29:06 UTC 2006


Hello all,

I´m currently trying to implement Latent Semantic Analysis, as part of
an automatic classification system. I´m programming in Java, and using
the Jama Matrix package for the matrix stuff. I have stumbled over some
strange problems, and would be grateful if anyone on this list  could
offer some help.
My problem is: I have implemented a class which takes care of building a
matrix representation of a corpus, and performs SVD over the
term-by-document matrix. Most of the operations are done by the Jama
class "Matrix".  This works fine, except for the fact that when I ran
the program over various small test corpora (like, for instance, the one
from Chapter 15 in Schütze and Manning´s book Foundations of Statistical
NLP) most of the righ and left singular vectors contained the correct
values but with wrong/reversed sign?! E.g. a vector that should have the
values [-0.75,-0.28,-0.20, ...] are assigned the values [0.75,0.28,
...]. Unfortunately, I have limited experience with linear algebra and
the like so now I  find myself completely at loss in debugging this...
As far as I can understand, this means that my vectors are pointing in
the opposite direction from the one they should, but why this is escapes
my understanding :)
Any help, hints, tricks and the like are extremely welcome! I can also
send over the source code on request.

Regards,
--
Cecilie D. Widsteen
Department of Linguistics
University of Oslo



More information about the Corpora mailing list