[Corpora-List] IDF values
Min-Yen Kan
kanmy at comp.nus.edu.sg
Wed May 12 08:39:46 UTC 2004
Hi Clive De Silva:
This doesnt quite fit the bill, but if you dont mind an
international corpus, UC Berkeley has a computed the DFs of words on the
Stanford WebBase corpus. See
http://elib.cs.berkeley.edu/docfreq/
My group has been using it for a number of different projects that require
DF / IDF.
Regards,
Min-Yen KAN
Assistant Professor
Department of Computer Science, School of Computing
National University of Singapore, Singapore 117543
Office: S15-05-05
Tel: ++ (65) 6874-1885
Fax: ++ (65) 6779-4580
kanmy at comp.nus.edu.sg
http://www.comp.nus.edu.sg/~kanmy
-----Original Message-----
From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no] On
Behalf Of Clive De Silva
Sent: Wednesday, May 12, 2004 4:24 PM
To: CORPORA at HD.UIB.NO
Subject: [Corpora-List] IDF values
Hi all.
I need to get IDF values for an American corpus of at least 100MW words. I
have access to TREC4 and TREC5 corpus but would prefer to not have to
extract the information 'manually' and was wondering if there are IDF values
out there already calculated from a large corpus. If not, are there any
tools for extracting IDFs efficiently?
Regards,
Clive De Silva
MPhil student at the Computing Lab
University of Cambridge, UK
More information about the Corpora
mailing list