[Corpora-List] IDF values

Min-Yen Kan kanmy at comp.nus.edu.sg
Wed May 12 08:39:46 UTC 2004


Hi Clive De Silva:
	This doesn’t quite fit the bill, but if you don’t mind an
international corpus, UC Berkeley has a computed the DFs of words on the
Stanford WebBase corpus.  See 

http://elib.cs.berkeley.edu/docfreq/

My group has been using it for a number of different projects that require
DF / IDF.

Regards,

Min-Yen KAN
Assistant Professor
Department of Computer Science, School of Computing
National University of Singapore, Singapore 117543
Office: S15-05-05
Tel: ++ (65) 6874-1885
Fax: ++ (65) 6779-4580
kanmy at comp.nus.edu.sg
http://www.comp.nus.edu.sg/~kanmy



-----Original Message-----
From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no] On
Behalf Of Clive De Silva
Sent: Wednesday, May 12, 2004 4:24 PM
To: CORPORA at HD.UIB.NO
Subject: [Corpora-List] IDF values

Hi all.
 
I need to get IDF values for an American corpus of at least 100MW words. I
have access to TREC4 and TREC5 corpus but would prefer to not have to
extract the information 'manually' and was wondering if there are IDF values
out there already calculated from a large corpus. If not, are there any
tools for extracting IDFs efficiently?
 
Regards,

Clive De Silva
MPhil student at the Computing Lab
University of Cambridge, UK



More information about the Corpora mailing list