| Helpdesk  | Mobile  |   | 
Research Group on Intelligent Engineering and Computing for Advanced Innovation and Development
Skip Navigation Links
Details hide
Responsible: Maria de Fátima Coutinho Rodrigues
Name: Ensemble Methods for Unsupervised and Semi-Supervised Learning
Acronym: EMUSSL
Financer Entitie: FCT
Reference: POSI/EEA-SRI/61924/2004
Budget: 31896€
Start Date: 2005
Finish Date: 2008
Status: Finished
Description: Data clustering or unsupervised learning is an important but difficult problem. The objective of clustering is to establish a partition of a set of unlabelled objects into homogeneous groups or clusters. A number of application areas use clustering techniques for organizing or discovering structure in data, such as data mining, information retrieval, document analysis, bioinformatics, image segmentation. Quantitative evaluation of the quality of clustering results is difficult due to the inherent subjectivity of the notion of cluster. Semi-supervised learning algorithms make use of both labelled and unlabelled samples and have been the focus of much recent research by the machine learning and pattern recognition communities. Semi-supervised learning has been typically viewed from a supervised learning point of view, as a being a classification problem with some missing labels. This perspective suggests the use of supervised ensemble methods (such as boosting) to semi-supervised problems. This research direction will be exploited in the project. To the best of our knowledge, ensemble methods have not yet been proposed for semi-supervised learning problems. Additionally, one of the main threads of this project will be to address semi-supervised learning as an unsupervised learning problem with some additional knowledge/constraints. This perspective will allow semi-supervised and unsupervised problems to be addressed in an integrated fashion, which we will do with a strong emphasis on ensemble methods. This class of methods, which constitutes the state-of-the-art in supervised learning, has been recently also proposed for unsupervised learning, with much success. Research in ensemble clustering involves three types of aspects: obtaining the cluster ensembles, combining information from the several partitions, and validating/evaluating the combination results. In this project we will address all these aspects. We intend to explore different clustering algorithms, and data representations, in order to produce cluster ensembles. We will build upon and extend previous work on combination techniques, by assigning different weights to partitions and clusters; these will be supported on cluster validation criteria, which will also constitute an outcome of the project. A byproduct of the research on combination of different clustering algorithms will be a toolbox containing all the most recent and state-of-the-art clustering algorithms. The developed methods will be evaluated experimentally on challenging learning problems, namely document and web page classification and bioinformatics. The semi-supervised formulation is particularly well suited for these problems, due to the availability of huge amounts of unlabelled data, and the relatively high cost of obtaining labeled training data.
Contacts  | Helpdesk  | Mobile  | Topo  | Webmaster
This research group is supported by FEDER Funds through the “Programa Operacional Factores de Competitividade - COMPETE” program and by National Funds through FCT “Fundação para a Ciência e a Tecnologia” under the projects: UID/EEA/00760/2013