<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=iso-8859-1">
<META content="MSHTML 6.00.2800.1400" name=GENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=#ffffff>
<DIV><FONT face=Arial size=2>Hi all.</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>I need to get IDF values for an American corpus of
at least 100MW words. I have access to TREC4 and TREC5 corpus b</FONT><FONT
face=Arial size=2>ut would prefer to not have to extract the information
'manually' and was wondering if there are IDF values out there already
calculated from a large corpus. If not, are there any tools for extracting
IDFs efficiently?</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>Regards,</FONT></DIV><FONT face=Arial size=2>
<DIV><BR>Clive De Silva</DIV>
<DIV>MPhil student at the Computing Lab</DIV>
<DIV>University of Cambridge, UK</FONT></DIV></BODY></HTML>