[Corpora-List] efficient decision tree tool?

Caren Brinckmann cabr at coli.uni-sb.de
Thu Jan 19 01:12:29 UTC 2006


Dear all,

we are currently working on corpus-based models of duration, F0, 
intensity, and segmental reductions in read and spontaneous speech. For 
the first part of our study we will use decision trees.

Since our database is fairly large, I am looking for an efficient decision 
tree tool with the following features:

* nominal and numeric input features and predictees (classification and 
regression trees)
* binary as well as multi-way splits
* efficient handling of large datasets (200,000 cases/records/instances 
with up to 100 attributes/features/variables)
* nice to have: integrated feature selection algorithm

In previous studies, I've worked with "wagon" from the Edinburgh Speech 
Tools Library (http://www.cstr.ed.ac.uk/projects/speech_tools/) and "J48" 
from Weka (http://www.cs.waikato.ac.nz/ml/weka/). While wagon is very fast 
and memory-efficient, it only allows binary splits (as far as I know). 
Weka allows multi-way splits, but is too slow and memory-consuming for our 
current datasets.

I'm looking forward to your suggestions!

Kind regards,

Caren.

P.S.: If you know any other mailing list or forum where I could post my 
question, please let me know.

--
Caren Brinckmann
Saarland University, FR 4.7 Institute of Phonetics
P.O.Box 151150, 66041 Saarbruecken, Germany
Phone: +49-681-3024244, Fax: +49-681-3024684



More information about the Corpora mailing list