摘要
由于维数灾难的原因,高维空间的数据聚类是一个具有挑战性的问题。文中提出了一种自适应子空间选择的方法来解决这一难题。该方法采用局部线性嵌入的方法将高维数据映射到低维子空间上,然后采用两步迭代的方法自适应地选择最具有判别力的子空间:固定子空间不变,用K-均值聚类的方法产生类别的标号;固定类别的标号不变,用线性判别分析的方法将样本映射到低维子空间进行子空间选择。通过反复迭代,样本在低维子空间进行有效聚类而避免了维数灾难,同时子空间自适应地调整到全局最优。大量的实验结果表明,该方法聚类效果优于传统的K-均值聚类。
Clustering in high dimensional datasets is a challenging problem due to the curse of dimensionality. In this paper,present an a-daptive subspaces selection approach to solve this problem. Datasets are projected into lower dimensional subspace through locally linear embedding. Then two iterative steps are implemented to adaptively select the most discriminative subspace:fixing the subspaces,K-means clustering is performed to generate cluster labels;fixing cluster labels,linear discriminant analysis is performed to do subspaces selection. Through iterative steps,clusters are discovered in the lower dimensional subspaces to avoid the curse of dimensionality,while the sub-spaces are adaptively re-adjusted for global optimality. Extensive experimental results show the benefits of the approach versus traditional K-means clustering.
出处
《计算机技术与发展》
2013年第10期83-86,共4页
Computer Technology and Development
基金
国家自然科学基金资助项目(11001212)
国家磷资源开发利用工程技术研究中心开放基金(2012国磷k005)
武汉工程大学博士起动基金(12106021)
关键词
子空间选择
线性判别分析
K-均值聚类
subspaces selection
linear discriminant analysis
K-means clustering