[Corpora-List] Precision and Recall

Sat Apr 19 18:54:23 UTC 2008

It may help this discussion if I offer a graphice I drew recently to  
help me remember how these measures work.
http://www.dcs.shef.ac.uk/~kiffer/docs/sensitivity.pdf

Note also there are a number of other related measures used by other  
disciplines (clinical and medical studies, bioinformatics, etc.) which  
use the same basic division of the population (true negative, true  
positive etc.) but perform different calculations.

The respective wikipedia articles for the different measures are quite  
enlightening. And they help one understand *why* one would want these  
different equations.

Christopher

*****************************************************
Department of Computer Science, University of Sheffield
Regent Court, 211 Portobello Street
Sheffield   S1 4DP   UNITED KINGDOM
Web: http://www.dcs.shef.ac.uk/~kiffer/
Tel: +44(0)114-22.21967  Fax: +44 (0)114-22.21810
Skype: christopherbrewster
SkypeIn (UK): +44 (20) 8144 0088
SkypeIn (US): +1 (617) 381-4281
*****************************************************
Corruptissima re publica plurimae leges. Tacitus. Annals 3.27

On 19 Apr 2008, at 19:30, Angus Grieve-Smith wrote:
On Sat, 19 Apr 2008, Daniel Zeman wrote:

> the false positives/negatives are absolute numbers. If you evaluate,  
> say,
> performance of a parser on two different data sets and you get  
> fp=100 and
> fn=100 for both, you still cannot say that both sets are equally  
> hard for the
> parser. It may well be that the sets were not the same size and that  
> tp1=100
> while tp2=1000.

	Okay, I see that you would want to know how many false negatives
there are as a proportion - i.e. how many of the positives it found
correctly - so I see the value in "recall," even if it doesn't make much
sense as a name.  But it seems to me that the raw number of false
negatives is also valuable.

	But false positives are false positives; why does it matter how
many true positives there were?  Because it's a measure of how muddy the
water is?  It seems like here, absolute numbers of false positives would
be more valuable in many situations.  As Google found, it often doesn't
matter how many false positives you have, as long as the most valuable
true positives are close to the top of the list.

	Incidentally, this is not a purely academic line of questioning; I
worked on an information retrieval project that failed in part because
precision and recall did not accurately predict customer satisfaction.

					-Angus B. Grieve-Smith
					grvsmth at panix.com

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora