[Corpora-List] Question about LSA

Matías Guzmán mortem.dei at gmail.com
Wed Mar 27 19:57:45 UTC 2013


Dear all,

I'm trying to find the semantic differences between two synonymous, but
syntactically different constructions. Among the things I've thought of
doing is a latent semantic analysis comparing the sentences for
construction A with the sentences for construction B. When I do it
including all words, and get the angle of the two vectors, I arrive at a
cos(x) = 0.95. If, however, I remove all words with counts higher than 100,
I get something like 0.46. Now, the 0.95 value makes not much sense because
there are no repeated sentences, but I'm not sure that I can just remove
all words with more than 100 occurrences. Is this a valid procedure or
should I take the 0.95 result?

The second question is more about lsa in itself. Since I only have two
text, I'm omitting singular value decomposition, I'm not sure I can do
this. If I actually carry out the SVD, I get a 2 by 2 matrix with columns
perpendicular to each other. Should I take this result? It also makes no
sense to me.

Thanks a lot,

Matías
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20130327/61aa7bc4/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list