[Corpora-List] software for semi-supervised clustering?

Caren Brinckmann caren at brinckmann.de
Thu May 26 14:31:43 UTC 2011


Dear all,
 
I have a corpus of 3.4 million texts where each text is labeled with a "genre"
(or "type") category by the text provider. Since the texts stem from many
different sources, these genre labels are very heterogeneous. To reduce the
number of genres and to unify the labels I would like to cluster the given
genres hierarchically, i.e. those texts that already have the same genre label
should belong to the same cluster (must-link constraint).
 
Can you recommend a software package that includes this kind of semi-supervised
clustering? Or do you have another idea how to cluster the genres?
 
Thanks you for your help!
Caren.
 
--
Caren Brinckmann
Institut für Deutsche Sprache (IDS)
R5, 6-13
68161 Mannheim
Germany

 
 

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list