摘要
In supervised learning the number of values of a response variable can be very high. Grouping these values in a few clusters can be useful to perform accurate supervised classification analyses. On the other hand selecting relevant covariates is a crucial step to build robust and efficient prediction models. We propose in this paper an algorithm that simultaneously groups the values of a response variable into a limited number of clusters and selects stepwise the best covariates that discriminate this clustering. These objectives are achieved by alternate optimization of a user-defined model selection criterion. This process extends a former version of the algorithm to a more general framework. Moreover possible further developments are discussed in detail.
In supervised learning the number of values of a response variable can be very high. Grouping these values in a few clusters can be useful to perform accurate supervised classification analyses. On the other hand selecting relevant covariates is a crucial step to build robust and efficient prediction models. We propose in this paper an algorithm that simultaneously groups the values of a response variable into a limited number of clusters and selects stepwise the best covariates that discriminate this clustering. These objectives are achieved by alternate optimization of a user-defined model selection criterion. This process extends a former version of the algorithm to a more general framework. Moreover possible further developments are discussed in detail.
作者
Olivier Collignon
Jean-Marie Monnez
Olivier Collignon;Jean-Marie Monnez(Luxembourg Institute of Health, Strassen, Luxembourg;Institut Elie Cartan de Lorraine, University de Lorraine, CNRS UMR 7502, Vandoeuvre-lès-Nancy, France;INRIA, Projet BIGS, Vandoeuvre-lès-Nancy, France;CIC-P, CHRU, University Hospital, Nancy, France)