[Corpora-List] unsupervised with semi-supervised
Ben Allison
B.Allison at dcs.shef.ac.uk
Thu Apr 24 11:30:49 UTC 2008
Taras,
Here is my opinion, for what it's worth...
Firstly, as some other people noted, I think it's somewhat confusing to
think about the positioning of rule-based systems in the context of
supervised/semi-supervised/unsupervised methods, since the degree of
supervision is fundamentally a way of describing the behavior of a
method which induces classification rules from labelled examples. I
agree that the issue is somewhat complicated if the rules are used to
label examples, and these are then used -- however, I believe there are
then issues of the integrity of the training examples and so on. If all
rule-sets produce perfect labelling of examples, I don't see that the
absolute number of rules is an issue in the degree of supervision. If
not, it strikes me that the purity of the labelled sets is the key
issue, not the number of rules which were used to create them.
Your other two questions again relate to the *number* of examples used,
and I don't believe the definitions of the supervision paradigms have
anything to do with the number of examples (rather the number of
examples, amongst other things, typically dictates which paradigm to
use). Whilst there is considerable wiggle-room within the definitions,
and my feeling is that they are only consensus anyway rather than
rigorous tests, my understanding of the division is as follows:
unsupervised: no labelled examples
supervised: labelled examples
semi-supervised: mixture of labelled and unlabelled examples
Notice that the semi-supervised problem is trivially related to the
other two paradigms: remove all labelled examples, and the problem
becomes unsupervised; remove all unlabelled examples, and it becomes
supervised.
As I said before, I believe that the number of examples is only of issue
in deciding which paradigm is most appropriate for the task at hand. For
example, if labelled examples are hard to come by, but unlabelled ones
are plentiful, semi-supervised seems appropriate. If no labelled
examples are available, or the desire is to cluster/describe the data,
then unsupervised is most appropriate, and so on.
Of course, within the context of semi-supervised learning, there are yet
more divisions depending upon whether one is interested in inducing a
classification rule which covers all possible examples, or only those
examples which are currently unlabelled (transductive classification),
but I suspect this is a discussion for another time...
Hope this is of some help.
Ben
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list