[Corpora-List] Precision and Recall

Sat Apr 19 06:31:12 UTC 2008

Angus Grieve-Smith napsal(a):
>>> P = number of things I correctly found / number of things I found
>>> R = number of things I correctly found / number of things I should have
>>> found
>>>
>>> ("correctly found" = "I found it" AND "I should have found it")
>>>       
>> Here's another way to look at it, which I sometimes find useful:
>>
>> P = true_positives / (true_positives + false_positives)
>>
>> R = true_positives / (true_positives + false_negatives)
>>     
>
>  	Thank you both for bringing clarity to this topic!  Can anyone 
> tell me why precision and recall are more useful than the simple numbers 
> of false positives and false negatives?  What do you get out of mixing 
> that up with the number of true positives, some kind of odds ratio?
>
>   

Agnus,

the false positives/negatives are absolute numbers. If you evaluate, 
say, performance of a parser on two different data sets and you get 
fp=100 and fn=100 for both, you still cannot say that both sets are 
equally hard for the parser. It may well be that the sets were not the 
same size and that tp1=100 while tp2=1000. If you transform your numbers 
into P and R, you will see that the parser did much better for the 
second data set:

P1 = 100/(100+100) = 50 %
R1 = 100/(100+100) = 50 %

P2 = 1000/(1000+100) = 91 %
R2 = 1000/(1000+100) = 91 %

Regards,
Dan

-- 
RNDr. Daniel Zeman, Ph.D.
ÚFAL MFF, Univerzita Karlova, Praha
http://ufal.mff.cuni.cz/~zeman/

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora