21.4537, Disc: Term Frequency Weighting Choices

linguist at LINGUISTLIST.ORG linguist at LINGUISTLIST.ORG
Fri Nov 12 03:41:14 UTC 2010


LINGUIST List: Vol-21-4537. Thu Nov 11 2010. ISSN: 1068 - 4875.

Subject: 21.4537, Disc: Term Frequency Weighting Choices

Moderators: Anthony Aristar, Eastern Michigan U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>
 
Reviews: Monica Macaulay, U of Wisconsin-Madison  
Eric Raimy, U of Wisconsin-Madison  
Joseph Salmons, U of Wisconsin-Madison  
Anja Wanner, U of Wisconsin-Madison  
       <reviews at linguistlist.org> 

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, 
and donations from subscribers and publishers.

Editor for this issue: Elyssa Winzeler <elyssa at linguistlist.org>
================================================================  

To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.cfm.

===========================Directory==============================  

1)
Date: 09-Nov-2010
From: Leslie Barrett [lbarrett29 at hotmail.com]
Subject: Term Frequency Weighting Choices
 

	
-------------------------Message 1 ---------------------------------- 
Date: Thu, 11 Nov 2010 22:40:26
From: Leslie Barrett [lbarrett29 at hotmail.com]
Subject: Term Frequency Weighting Choices

E-mail this message to a friend:
http://linguistlist.org/issues/emailmessage/verification.cfm?iss=21-4537.html&submissionid=3738814&topicid=5&msgnumber=1
  


I am trying to decide whether to use a square-root-based term-frequency
weight or a variable weight based on the maximum term frequency in the
document (log-based weighting won't work for me because it isn't sensitive
enough to changes on the small end of the scale). I am using a corpus of
non-thematic documents, highly variable in length but none exceeding 10K
words. Has anyone either tried both on a similar corpus and has results
they could share or else does anyone know of any research comparing the
different weights on sample data? I would very much appreciate any advice.
Will post answers if appropriate. 


Linguistic Field(s): Computational Linguistics




-----------------------------------------------------------
LINGUIST List: Vol-21-4537	

	



More information about the Linguist mailing list