[Corpora-List] Finding representative terms
Dragomir Radev
radev at umich.edu
Mon Dec 26 16:59:14 UTC 2005
You should consider using TF*IDF instead of IDF. First, compute IDF
from a large external corpus. Then, compute TF for each of the words
in each of your input documents. A typical outcome would be:
IDF TF TF*IDF
the 0.01 20 0.20
today 1.00 2 2.00
Paris 5.00 2 10.00
Drago
Delip Rao wrote:
>
> Hi,
>
> Is there any work that tries to find the most
> important/representative words from a document? I have
> tried using IDF but results were very poor. Also IDF
> does not make sense if we have a single document and
> want to get the most important term(s) out of it.
>
> Thanks!
> Delip
>
>
>
> __________________________________
> Meet your soulmate!
> Yahoo! Asia presents Meetic - where millions of singles gather
> http://asia.yahoo.com/meetic
>
>
>
>
--
Dragomir R. Radev radev at umich.edu
Associate Professor of Information, Electrical Engineering and
Computer Science, and Linguistics, the University of Michigan, Ann Arbor
Phone: 734-615-5225 Fax: 734-764-2475 http://www.si.umich.edu/~radev
More information about the Corpora
mailing list