[Corpora-List] unsupervised with semi-supervised
Ben Allison
B.Allison at dcs.shef.ac.uk
Thu Apr 24 11:44:25 UTC 2008
Taras,
Here is my opinion, for what it's worth...
Firstly, as some other people noted, I think it's somewhat confusing to
think about the positioning of rule-based systems in the context of
supervised/semi-supervised/unsupervised methods, since the degree of
supervision is fundamentally a way of describing the behavior of a
method which induces classification rules from labelled examples. I
agree that the issue is somewhat complicated if the rules are used to
label examples, and these are then used -- however, I believe there are
then issues of the integrity of the training examples and so on. If all
rule-sets produce perfect labelling of examples, I don't see that the
absolute number of rules is an issue in the degree of supervision. If
not, it strikes me that the purity of the labelled sets is the key
issue, not the number of rules which were used to create them.
Your other two questions again relate to the *number* of examples used,
and I don't believe the definitions of the supervision paradigms have
anything to do with the number of examples (rather the number of
examples, amongst other things, typically dictates which paradigm to
use). Whilst there is considerable wiggle-room within the definitions,
and my feeling is that they are only consensus anyway rather than
rigorous tests, my understanding of the division is as follows:
unsupervised: no labelled examples
supervised: labelled examples
semi-supervised: mixture of labelled and unlabelled examples
Notice that the semi-supervised problem is trivially related to the
other two paradigms: remove all labelled examples, and the problem
becomes unsupervised; remove all unlabelled examples, and it becomes
supervised.
As I said before, I believe that the number of examples is only of issue
in deciding which paradigm is most appropriate for the task at hand. For
example, if labelled examples are hard to come by, but unlabelled ones
are plentiful, semi-supervised seems appropriate. If no labelled
examples are available, or the desire is to cluster/describe the data,
then unsupervised is most appropriate, and so on.
Of course, within the context of semi-supervised learning, there are yet
more divisions depending upon whether one is interested in inducing a
classification rule which covers all possible examples, or only those
examples which are currently unlabelled (transductive classification),
but I suspect this is a discussion for another time...
Hope this is of some help.
Ben
Taras Zagibalov wrote:
> Thank you Ted for your input.
> I will summarise all the replies I will (hopefully) get.
> As for the first question, I meant that all labelling was done manually.
> And the question is if the systems are unsupervised or semi-supervised,
> since some definitions state that semi-supervised systems make use of
> both labelled and unlabelled data. The confusing part is that although
> the amount of labelled data is so different, it doesn't seem to be relevant.
> As for question 3. I do realise that rule-based systems are supposed to
> be in another paradigm. But this also confuses me, since rules can be
> used to produce a set of labelled examples and, as you noted, can be
> regarded as a form of supervision.
>
> Regards,
> Taras
>
> Ted Pedersen wrote:
>
>> If you did get some responses on your original question to you
>> directly (and not to
>> the list) it would be quite helpful if you posted as a summary of
>> those to the list.
>>
>> In your question 1) did you mean to ask if the systems were still
>> supervised? All
>> those labeled examples suggest that they wouldn't be unsupervised
>> (although I may
>> have missed something in your query). Were the examples labeled manually or
>> via some automatic means?
>>
>> On 3, I'm not sure why rule-based is being associated with supervised
>> - do you mean
>> that the creation of rules is a form of supervision? That seems
>> reasonable, although
>> often rule-based and supervised systems are treated as separate categories.
>>
>> Cordially,
>> Ted
>>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list