The clustering of objects(individuals or variables)is one of the most used approaches to exploring multivariate data.The two most common unsupervised clustering strategies are hierarchical ascending clustering(HAC)and...The clustering of objects(individuals or variables)is one of the most used approaches to exploring multivariate data.The two most common unsupervised clustering strategies are hierarchical ascending clustering(HAC)and k-means partitioning used to identify groups of similar objects in a dataset to divide it into homogeneous groups.The proposed topological clustering of variables,called TCV,studies an homogeneous set of variables defined on the same set of individuals,based on the notion of neighborhood graphs,some of these variables are more-or-less correlated or linked according to the type quantitative or qualitative of the variables.This topological data analysis approach can then be useful for dimension reduction and variable selection.It’s a topological hierarchical clustering analysis of a set of variables which can be quantitative,qualitative or a mixture of both.It arranges variables into homogeneous groups according to their correlations or associations studied in a topological context of principal component analysis(PCA)or multiple correspondence analysis(MCA).The proposed TCV is adapted to the type of data considered,its principle is presented and illustrated using simple real datasets with quantitative,qualitative and mixed variables.The results of these illustrative examples are compared to those of other variables clustering approaches.展开更多
文摘The clustering of objects(individuals or variables)is one of the most used approaches to exploring multivariate data.The two most common unsupervised clustering strategies are hierarchical ascending clustering(HAC)and k-means partitioning used to identify groups of similar objects in a dataset to divide it into homogeneous groups.The proposed topological clustering of variables,called TCV,studies an homogeneous set of variables defined on the same set of individuals,based on the notion of neighborhood graphs,some of these variables are more-or-less correlated or linked according to the type quantitative or qualitative of the variables.This topological data analysis approach can then be useful for dimension reduction and variable selection.It’s a topological hierarchical clustering analysis of a set of variables which can be quantitative,qualitative or a mixture of both.It arranges variables into homogeneous groups according to their correlations or associations studied in a topological context of principal component analysis(PCA)or multiple correspondence analysis(MCA).The proposed TCV is adapted to the type of data considered,its principle is presented and illustrated using simple real datasets with quantitative,qualitative and mixed variables.The results of these illustrative examples are compared to those of other variables clustering approaches.