[Corpora-List] unsupervised with semi-supervised

Ben Allison B.Allison at dcs.shef.ac.uk
Thu Apr 24 11:44:25 UTC 2008


Taras,

Here is my opinion, for what it's worth...

Firstly, as some other people noted, I think it's somewhat confusing to 
think about the positioning of rule-based systems in the context of 
supervised/semi-supervised/unsupervised methods, since the degree of 
supervision is fundamentally a way of describing the behavior of a 
method which induces classification rules from labelled examples. I 
agree that the issue is somewhat complicated if the rules are used to 
label examples, and these are then used -- however, I believe there are 
then issues of the integrity of the training examples and so on. If all 
rule-sets produce perfect labelling of examples, I don't see that the 
absolute number of rules is an issue in the degree of supervision. If 
not, it strikes me that the purity of the labelled sets is the key 
issue, not the number of rules which were used to create them.

Your other two questions again relate to the *number* of examples used, 
and I don't believe the definitions of the supervision paradigms have 
anything to do with the number of examples (rather the number of 
examples, amongst other things, typically dictates which paradigm to 
use). Whilst there is considerable wiggle-room within the definitions, 
and my feeling is that they are only consensus anyway rather than 
rigorous tests, my understanding of the division is as follows:

unsupervised: no labelled examples
supervised: labelled examples
semi-supervised: mixture of labelled and unlabelled examples

Notice that the semi-supervised problem is trivially related to the 
other two paradigms: remove all labelled examples, and the problem 
becomes unsupervised; remove all unlabelled examples, and it becomes 
supervised.

As I said before, I believe that the number of examples is only of issue 
in deciding which paradigm is most appropriate for the task at hand. For 
example, if labelled examples are hard to come by, but unlabelled ones 
are plentiful, semi-supervised seems appropriate. If no labelled 
examples are available, or the desire is to cluster/describe the data, 
then unsupervised is most appropriate, and so on.

Of course, within the context of semi-supervised learning, there are yet 
more divisions depending upon whether one is interested in inducing a 
classification rule which covers all possible examples, or only those 
examples which are currently unlabelled (transductive classification), 
but I suspect this is a discussion for another time...

Hope this is of some help.

Ben


Taras Zagibalov wrote:
> Thank you Ted for your input.
> I will summarise all the replies I will (hopefully) get.
> As for the first question, I meant that all labelling was done manually. 
> And the question is if the systems are unsupervised or semi-supervised, 
> since some definitions state that semi-supervised systems make use of 
> both labelled and unlabelled data. The confusing part is that although 
> the amount of labelled data is so different, it doesn't seem to be relevant.
> As for question 3. I do realise that rule-based systems are supposed to 
> be in another paradigm. But this also confuses me, since rules can be 
> used to produce a set of labelled examples and, as you noted, can be 
> regarded as a form of supervision.
>
> Regards,
> Taras
>
> Ted Pedersen wrote:
>   
>> If you did get some responses on your original question to you
>> directly (and not to
>> the list) it would be quite helpful if you posted as a summary of
>> those to the list.
>>
>> In your question 1) did you mean to ask if the systems were still
>> supervised? All
>> those labeled examples suggest that they wouldn't be unsupervised
>> (although I may
>> have missed something in your query). Were the examples labeled manually or
>> via some automatic means?
>>
>> On 3, I'm not sure why rule-based is being associated with supervised
>> - do you mean
>> that the creation of rules is a form of supervision? That seems
>> reasonable, although
>> often rule-based and supervised systems are treated as separate categories.
>>
>> Cordially,
>> Ted
>>     
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>   

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list