[Corpora-List] Finding representative terms

Chris Jordan cjordan at cs.dal.ca
Mon Dec 26 17:09:29 UTC 2005


There is no such thing as an ideal term discrimination function 
unfortunately however I would recommend trying something like relative 
entropy. It is what I have used in the past with my thesis work on 
automatically manufacturing queries. Cai et al also used relative and 
other divergence functions for query expansion.

 *@inproceedings*{Cai_query_expansion,
 author = {D. Cai and C. J. van Rijsbergen and J. M. Jose},
 title = {Automatic query expansion based on divergence},
 booktitle = {CIKM '01: Proceedings of the Tenth International Conference on Information and Knowledge Management},
 year = {2001},
 isbn = {1-58113-436-3},
 pages = {419--426},
 location = {Atlanta, Georgia, USA},
 doi = {http://doi.acm.org/10.1145/502585.502656},
 publisher = {ACM Press},
 }



Delip Rao wrote:

>Hi,
>
>Is there any work that tries to find the most
>important/representative words from a document? I have
>tried using IDF but results were very poor. Also IDF
>does not make sense if we have a single document and
>want to get the most important term(s) out of it.
>
>Thanks!
>Delip
>
>
>		
>__________________________________ 
>Meet your soulmate!
>Yahoo! Asia presents Meetic - where millions of singles gather
>http://asia.yahoo.com/meetic
>
>
>  
>



More information about the Corpora mailing list