Corpora: Performance measures of text categorization
Fuchun Peng
f3peng at ai.uwaterloo.ca
Mon Mar 4 22:32:59 UTC 2002
Dear List members:
I have a question about the performance measures of the text
categorization.
The standard performance measure in text categorization is the breakeven
point, which is defined as the point where the precion equals the
recall. The reason for doing this to balance the precision
and recall. But such a point normally does not exist in experiments. So
people have to use interpolation (or extrapolation) to get this point from
the precision-recall curve.
In IR community, people often use the F-measure to balance to precision
and recall. F-measure is defined as
"2*precision*recall/(precision+recall)".
The computation of the breakeven point (interpolation) is more
difficult than computing the F-measure (simple formula). So I do not see
any advantages of the breakeven point measure over the F-measure. One
reason for people to keep using the breakeven point measure maybe becauese
they have to compare their results with previous researchers, who measured the
performance with the breakeven point. But beside this, does anybody know
any arguments why the breakeven point instead of the F-measure should be
used in text categorization?
Best regards
Fuchun
--------------------------------------------------------
Fuchun Peng PhD candidate
Computer Science Department, University of Waterloo
Waterloo, Ontario, Canada, N2L 3G1
1-519-888-4567 ext 5392 f3peng at ai.uwaterloo.ca
http://ai.uwaterloo.ca/~f3peng
More information about the Corpora
mailing list