[Corpora-List] How to word presentation for word clustering?
Clive De Silva
cd334 at cam.ac.uk
Wed Jul 7 14:44:00 UTC 2004
Dear Chen Wenliang,
I am using TF*IDF values as my representation for words.
vector w = { tf(1)*IDF(1), tf(2)*IDF(2)....,tf(n)*IDF(n))} where the IDF is
computed from a large corpus. This seems to give better results than just
the raw frequency counts.
The representations I investigated were: TF, TF*IDF and simple binary(1
represents the word existing in the vector and 0 if it isn't) counts.
Regards,
Clive De Silva
University of Cambridge
----- Original Message -----
From: "chen wenliang" <chenwl at mail.neu.edu.cn>
To: <corpora at hd.uib.no>
Sent: Wednesday, July 07, 2004 10:17 AM
Subject: [Corpora-List] How to word presentation for word clustering?
Dear all,
I am looking for a word presentation for word clustering.
I am doing a project that is about word clustering. Now I use a presentation
that word is presented as
a vector w = {tf(1),tf(2),...,tf(n)}, tf(i) is the frequency of the word in
document i. Then I use k-means
as the clustering algorithm.
Thanks all.
regards,
Chen Wenliang chenwl at mail.neu.edu.cn
Nlplab, Northeastern University, China.
2004-07-07
More information about the Corpora
mailing list