[Corpora-List] efficient decision tree tool?
Caren Brinckmann
cabr at coli.uni-sb.de
Thu Jan 19 01:12:29 UTC 2006
Dear all,
we are currently working on corpus-based models of duration, F0,
intensity, and segmental reductions in read and spontaneous speech. For
the first part of our study we will use decision trees.
Since our database is fairly large, I am looking for an efficient decision
tree tool with the following features:
* nominal and numeric input features and predictees (classification and
regression trees)
* binary as well as multi-way splits
* efficient handling of large datasets (200,000 cases/records/instances
with up to 100 attributes/features/variables)
* nice to have: integrated feature selection algorithm
In previous studies, I've worked with "wagon" from the Edinburgh Speech
Tools Library (http://www.cstr.ed.ac.uk/projects/speech_tools/) and "J48"
from Weka (http://www.cs.waikato.ac.nz/ml/weka/). While wagon is very fast
and memory-efficient, it only allows binary splits (as far as I know).
Weka allows multi-way splits, but is too slow and memory-consuming for our
current datasets.
I'm looking forward to your suggestions!
Kind regards,
Caren.
P.S.: If you know any other mailing list or forum where I could post my
question, please let me know.
--
Caren Brinckmann
Saarland University, FR 4.7 Institute of Phonetics
P.O.Box 151150, 66041 Saarbruecken, Germany
Phone: +49-681-3024244, Fax: +49-681-3024684
More information about the Corpora
mailing list