摘要
特征选择是模式识别中的一个重要组成部分。针对未知类标号的样本集,提出基于中心距离比值准则的无监督特征选择算法。该算法利用爬山法确定聚类数目范围和估计初始聚类中心,再通过K-均值聚类算法确定特征子集的最佳分类数,然后用中心距离比值准则来评价特征子集的分类性能,并通过特征间的相关性分析,从中选择出分类效果好,相关程度低的特征组成特征子集。
Feature selection is an important component of pattern recognition.For unknown class label samples set,an unsupervised feature selection algorithm based on center distance ratio principle is proposed.The algorithm uses the mountain method to get the range of clustering number and estimate original clustering centers,then K-means clustering algorithm is adopted to confirm the optimal classification number of feature subset,and then center distance ratio principle is used to measure the classification performance of feature subset,moreover the feature correlation is analyzed,so the features with good class effect and low correlation are selected.
出处
《计算机工程与应用》
CSCD
北大核心
2009年第4期162-164,共3页
Computer Engineering and Applications
关键词
特征选择
中心距离比值
相关性
聚类
无监督
feature selection
center distance ratio
correlation
clustering
unsupervised