[Corpora-List] Finding representative terms

Dragomir Radev radev at umich.edu
Mon Dec 26 16:59:14 UTC 2005


You should consider using TF*IDF instead of IDF. First, compute IDF
from a large external corpus. Then, compute TF for each of the words
in each of your input documents. A typical outcome would be:

       IDF   TF    TF*IDF
the   0.01   20      0.20
today 1.00   2       2.00
Paris 5.00   2      10.00

Drago

Delip Rao wrote:
> 
> Hi,
> 
> Is there any work that tries to find the most
> important/representative words from a document? I have
> tried using IDF but results were very poor. Also IDF
> does not make sense if we have a single document and
> want to get the most important term(s) out of it.
> 
> Thanks!
> Delip
> 
> 
> 		
> __________________________________ 
> Meet your soulmate!
> Yahoo! Asia presents Meetic - where millions of singles gather
> http://asia.yahoo.com/meetic
> 
> 
> 
> 


-- 
Dragomir R. Radev                                         radev at umich.edu
Associate Professor of Information, Electrical Engineering and
Computer Science, and Linguistics, the University of Michigan, Ann Arbor
Phone: 734-615-5225   Fax: 734-764-2475    http://www.si.umich.edu/~radev



More information about the Corpora mailing list