Corpora: TiMBL 4.0 - new release of Tilburg Memory-Based Learner
Antal van den Bosch
Antal.vdnBosch at kub.nl
Tue Aug 14 12:05:00 UTC 2001
----------------------------------------------------------------------
Software release: TiMBL 4.0
Tilburg Memory Based Learner
ILK Research Group, http://ilk.kub.nl
CNTS - Language Technology Group
----------------------------------------------------------------------
Apologies for double postings.
The ILK (Induction of Linguistic Knowledge) Research Group at Tilburg
University, The Netherlands, and the CNTS - Language Technology Group,
at the University of Antwerp, Belgium, announce the release of a new
version of TiMBL, the "Tilburg Memory Based Learner", version 4.0.
TiMBL is a machine learning program implementing a family of
Memory-Based Learning techniques. TiMBL stores a representation of the
training set explicitly in memory (hence `Memory Based'), and
classifies new cases by extrapolating from the most similar stored
cases.
TiMBL is being developed with a focus on classification tasks with
symbolic data, large numbers of features and values, and very large
case bases, as typically found in natural language processing. However,
TiMBL can be applied to any machine learning or data mining task for
which labeled examples with fixed numbers of features are available.
The main features of the system are:
- Support for symbolic, numeric and binary features.
- Automatic feature weighting. Information Gain, Gain Ratio,
Chi-squared, and Shared Variance weighting are provided for dealing
with features of differing importance.
- Stanfill & Waltz's / Cost & Salzberg's (Modified) Value Difference
metric for making graded guesses of the match between two
different symbolic values.
- Speed up optimizations that enhance the underlying k-nearest
neighbor classifier kernel: Conversion of the flat instance memory
into a decision tree, and inverted indexing of the instance memory,
both yielding faster classification.
- Further compression and pruning of the decision tree, guided by
feature information gain differences, for even larger speed-ups
(the IGTREE and TRIBL learning algorithms).
- Verbose output to enable the monitoring of the process of
extrapolation from nearest neighbors.
- A multithreaded TiMBL server that can be used as a classification
agent.
- Fast leave-one-out testing.
Version 4.0 offers a number of new features:
- Class voting weighted by distance (inverse, linear, or decayed
exponentially) or by user-defined exemplar weights.
- Emulation of the IB2 algorithm, an incremental editing variant of
IB1 (Aha, Kibler and Albert, 1991).
- Internal n-fold cross-validation testing.
- Various additional verbosity options, bug-fixes and code improvements.
For more information: The reference guide ("TiMBL: Tilburg
Memory-Based Learner, version 4.0, Reference Guide.", Walter
Daelemans, Jakub Zavrel, Ko van der Sloot, and Antal van den
Bosch. ILK Technical Report 01-04) can be downloaded separately and
directly from
http://ilk.kub.nl/downloads/pub/papers/ilk0104.ps.gz
-[download]-----------------------------------------------------------
You are invited to download the TiMBL package for educational or
non-commercial research purposes. When downloading the package you are
asked to register, and express your agreement with the license
terms. TiMBL is *not* shareware or public domain software. If you have
registered for a previous version, please be so kind to re-register
for the upgrade. TiMBL can be downloaded from
http://ilk.kub.nl/
by following the `Software' link.
The TiMBL package contains:
- Source code (C++) with a Makefile.
- A reference guide containing descriptions of the incorporated
algorithms, detailed descriptions of the commandline options,
and a brief hands-on tutorial.
- Some example datasets.
- The text of the license agreement.
The package should be easy to install on most UNIX systems.
-[contact]---------------------------------------------------------
For comments and bugreports relating to TiMBL, please send mail to
Timbl at kub.nl
----------------------------------------------------------------------
More information about the Corpora
mailing list