摘要
针对肿瘤基因数据维度高、噪声多、冗余性高的现状,结合Spearman相关系数改进F-score算法,在此基础上优化二进制灰狼算法,提出了一种基于改进F-score和二进制灰狼算法的肿瘤基因选择算法.首先,考虑特征之间的相关性,计算每个特征的F-score值和特征之间的Spearman相关系数的绝对值;然后,计算权重系数得出各个特征的权重值,依据重要性进行排序,选出初选特征子集;最后,通过收敛因子的衰减曲线和初始化方法优化二进制灰狼算法,调整全局搜索和局部搜索所占比例,增强全局搜索能力并提高局部搜索速度,有效节省时间开销,提升特征选择的分类性能和效率,得到最优特征子集.在9个肿瘤基因数据集上测试所提算法,在分类准确率和筛选特征数目两个指标上进行仿真实验,并与4种其他算法进行对比,实验结果证明所提算法表现良好,可有效降低基因数据维度,并具有较好的分类精度.
According to the tumor gene situation of high dimensionality,noise and redundancy,this paper improved the F-score algorithm by the Spearman correlation coefficient,optimized the binary gray wolf algorithm,and proposed a gene feature selection algorithm with the improved F-score and the binary gray wolf algorithm.Firstly,by considering the correlation between features,the F-score value of each feature and the absolute value of Spearman correlation coefficient between features were calculated.Secondly,by calculating the weight coefficient,the weight value of each feature was derived to be ranked according to their importance and select a primary feature subset.Finally,the binary gray wolf algorithm was optimized through adjusting the proportion of global search and local search to enhance the global search capability and improve the speed of local search,so that the time overhead could be saved and the optimal feature subset was selected,which can improve the classification performance and efficiency of feature selection.The designed algorithm is tested on nine tumor gene datasets and simulated on two indexes of correct accuracy and number of filtered features.When compared with four other algorithms,the experimental results prove that the algorithm performed well,reduced the dimensionality of gene data,and had better classification accuracy.
作者
穆晓霞
郑李婧
Mu Xiaoxia;Zheng Lijing(College of Computer and Information Engineering,Henan Normal University,Xinxiang 453007,China)
出处
《南京师大学报(自然科学版)》
CAS
北大核心
2024年第1期111-120,共10页
Journal of Nanjing Normal University(Natural Science Edition)
基金
国家自然科学基金项目(61772176).