[Corpora-List] Decision tree : maximise recall over precision

Eddie Bell e.bell at comp.lancs.ac.uk
Tue Apr 21 14:33:19 UTC 2009


Hi Emmanuel,

I recently had a similar unbalanced data-set (98% 'No') and used an
SVM with prior weights. The prior weights force the model to account
for the recessive category by penalizing the classification errors of
the dominant category (i.e. making recessive class accuracy more
important).

SVMs aren't as interpretable as decision trees, if trees are required
I believe the 'rpart' R package supports weighting. I'm not familiar
enough with weka to guide you in that respect but weights should help
with your problem.

regards
  - eddie

2009/4/21 Emmanuel Prochasson <emmanuel.prochasson at univ-nantes.fr>:
> Dear all,
>
> I would like to build a decision tree (or whatever supervised classifier
> relevant) on a set of data containing 0.1% "Yes" and 99.9% "No", using
> several attributes (12 for now, but I have to tune that). I use Weka,
> which is totally awesome.
>
> My goal is to prune search space for another application (ie : remove
> say, 80% of the data that are very unlikely to be "Yes"), that's why I'm
> trying to use a decision tree. Of course some algorithm returns a 1 leaf
> node tree tagged "No", with a 99.9% precision, which is pretty accurate,
> but ensure I will always withdraw all of my search space rather than
> prune it.
>
> My problem is : is there a way (algorithm ? software ?) to build a tree
> that will maximise recall (all "Yes" elements tagged "Yes" by the
> algorithm). I don't really care about precision (It's ok if many "No"
> elements are tagged "Yes" -- I can handle false positive).
>
> In other word, is there a way to build a decision tree under the
> constraint of 100% recall ?
>
> I'm not sure I made myself clear, and I'm not sure there are solutions
> for my problem.
>
> Regards,
>
> --
> Emmanuel
>
>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>



-- 
Edward J. L. Bell
C28, Computing Department,
Infolab 21, Lancaster University

+44(0) 15245 10348

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list