[Corpora-List] Decision tree : maximise recall over precision

Emmanuel Prochasson emmanuel.prochasson at univ-nantes.fr
Tue Apr 21 13:38:36 UTC 2009


Dear all,

I would like to build a decision tree (or whatever supervised classifier 
relevant) on a set of data containing 0.1% "Yes" and 99.9% "No", using 
several attributes (12 for now, but I have to tune that). I use Weka, 
which is totally awesome.

My goal is to prune search space for another application (ie : remove 
say, 80% of the data that are very unlikely to be "Yes"), that's why I'm 
trying to use a decision tree. Of course some algorithm returns a 1 leaf 
node tree tagged "No", with a 99.9% precision, which is pretty accurate, 
but ensure I will always withdraw all of my search space rather than 
prune it.

My problem is : is there a way (algorithm ? software ?) to build a tree 
that will maximise recall (all "Yes" elements tagged "Yes" by the 
algorithm). I don't really care about precision (It's ok if many "No" 
elements are tagged "Yes" -- I can handle false positive).

In other word, is there a way to build a decision tree under the 
constraint of 100% recall ?

I'm not sure I made myself clear, and I'm not sure there are solutions 
for my problem.

Regards,

-- 
Emmanuel



_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list