21.4537, Disc: Term Frequency Weighting Choices
linguist at LINGUISTLIST.ORG
linguist at LINGUISTLIST.ORG
Fri Nov 12 03:41:14 UTC 2010
LINGUIST List: Vol-21-4537. Thu Nov 11 2010. ISSN: 1068 - 4875.
Subject: 21.4537, Disc: Term Frequency Weighting Choices
Moderators: Anthony Aristar, Eastern Michigan U <aristar at linguistlist.org>
Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>
Reviews: Monica Macaulay, U of Wisconsin-Madison
Eric Raimy, U of Wisconsin-Madison
Joseph Salmons, U of Wisconsin-Madison
Anja Wanner, U of Wisconsin-Madison
<reviews at linguistlist.org>
Homepage: http://linguistlist.org/
The LINGUIST List is funded by Eastern Michigan University,
and donations from subscribers and publishers.
Editor for this issue: Elyssa Winzeler <elyssa at linguistlist.org>
================================================================
To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.cfm.
===========================Directory==============================
1)
Date: 09-Nov-2010
From: Leslie Barrett [lbarrett29 at hotmail.com]
Subject: Term Frequency Weighting Choices
-------------------------Message 1 ----------------------------------
Date: Thu, 11 Nov 2010 22:40:26
From: Leslie Barrett [lbarrett29 at hotmail.com]
Subject: Term Frequency Weighting Choices
E-mail this message to a friend:
http://linguistlist.org/issues/emailmessage/verification.cfm?iss=21-4537.html&submissionid=3738814&topicid=5&msgnumber=1
I am trying to decide whether to use a square-root-based term-frequency
weight or a variable weight based on the maximum term frequency in the
document (log-based weighting won't work for me because it isn't sensitive
enough to changes on the small end of the scale). I am using a corpus of
non-thematic documents, highly variable in length but none exceeding 10K
words. Has anyone either tried both on a similar corpus and has results
they could share or else does anyone know of any research comparing the
different weights on sample data? I would very much appreciate any advice.
Will post answers if appropriate.
Linguistic Field(s): Computational Linguistics
-----------------------------------------------------------
LINGUIST List: Vol-21-4537
More information about the LINGUIST
mailing list