[Corpora-List] Precision and Recall
Daniel Zeman
zeman at ufal.mff.cuni.cz
Sat Apr 19 06:31:12 UTC 2008
Angus Grieve-Smith napsal(a):
>>> P = number of things I correctly found / number of things I found
>>> R = number of things I correctly found / number of things I should have
>>> found
>>>
>>> ("correctly found" = "I found it" AND "I should have found it")
>>>
>> Here's another way to look at it, which I sometimes find useful:
>>
>> P = true_positives / (true_positives + false_positives)
>>
>> R = true_positives / (true_positives + false_negatives)
>>
>
> Thank you both for bringing clarity to this topic! Can anyone
> tell me why precision and recall are more useful than the simple numbers
> of false positives and false negatives? What do you get out of mixing
> that up with the number of true positives, some kind of odds ratio?
>
>
Agnus,
the false positives/negatives are absolute numbers. If you evaluate,
say, performance of a parser on two different data sets and you get
fp=100 and fn=100 for both, you still cannot say that both sets are
equally hard for the parser. It may well be that the sets were not the
same size and that tp1=100 while tp2=1000. If you transform your numbers
into P and R, you will see that the parser did much better for the
second data set:
P1 = 100/(100+100) = 50 %
R1 = 100/(100+100) = 50 %
P2 = 1000/(1000+100) = 91 %
R2 = 1000/(1000+100) = 91 %
Regards,
Dan
--
RNDr. Daniel Zeman, Ph.D.
ÚFAL MFF, Univerzita Karlova, Praha
http://ufal.mff.cuni.cz/~zeman/
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list