摘要
针对多数邻域系统通过人工调试很难搜索到最佳邻域半径,以及传统的K-means聚类需要随机选取簇中心和指定簇的数目等问题,提出了一种基于邻域互信息与K-means特征聚类的特征选择方法。首先,将样本在各特征下与其他样本距离的平均值作为自适应邻域半径,确定样本的邻域集,并由此构建自适应邻域熵、邻域互信息、归一化邻域互信息等度量,反映特征之间的相关性;然后,基于归一化邻域互信息构建自适应K近邻集合,利用Pearson相关系数表示特征的权重定义加权K近邻密度,实现自动选取K-means算法的簇中心,进而完成K-means特征聚类;最后,给出加权平均冗余度,选出每个特征簇中加权平均冗余度最大的特征构成最优特征子集。实验结果表明所提算法不仅可以有效提升特征选择的分类结果而且可以获得更好的聚类效果。
Aiming at the problems that it is difficult to search the optimal neighborhood radius through manual debugging in most neighborhood systems,and that traditional K-means clustering requires random selection of cluster centers and the number of specified clusters,this paper proposed a feature selection method using neighborhood mutual information and feature clustering with K-means.Firstly,the average distance of the sample from other samples under each feature is taken as the adaptive neighborhood radius,and the neighborhood set of the sample is determined.Then to reflect the correlation between features,some metrics are presented,such as adaptive neighborhood entropy,neighborhood mutual information,normalized neighborhood mutual information,etc.Secondly,an adaptive K neighbor set is constructed based on the normalized neighborhood mutual information,and the weighted K neighbor density is defined by using the feature weight with the Pearson correlation coefficient so that the K-means algorithm can automatically select the cluster center.The K-means feature clustering is completed well.Finally,the weighted average redundancy degree is given,and the feature with the largest weighted average redundancy in each feature cluster is selected to form the optimal subset of features.Experimental results show that the developed algorithm can not only effectively improve the classification results of feature selection,but also obtain better clustering effects.
作者
孙林
梁娜
徐久成
SUN Lin;LIANG Na;XU Jiucheng(College of Artificial Intelligence,Tianjin University of Science and Technology,Tianjin 300457,China;College of Computer and Information Engineering,Henan Normal University,Xinxiang 453007,China)
出处
《智能系统学报》
CSCD
北大核心
2024年第4期983-996,共14页
CAAI Transactions on Intelligent Systems
基金
国家自然科学基金项目(62076089,61772176,61976082)
河南省科技攻关计划项目(2121-02210136).
关键词
特征选择
邻域互信息
K-MEANS
特征聚类
自适应K近邻
特征权重
加权K近邻密度
feature selection
neighborhood mutual information
K-means
feature clustering
adaptive K-nearest neighbor
feature weight
weighted k-nearest neighbor density