摘要
SNP数据作为重要的基因变异数据,是目前生物信息学领域中重要的课题之一,但由于SNP数据中存在较多的冗余和噪声,因此对SNP数据进行特征提取尤为重要。论文针对SNP数据少样本、高维度的问题和SNP位点之间具有强相关性的特点,在K-Means聚类中引入互信息,提出了一种改进的聚类算法K-MIM,将其应用于SNP选择中。K-MIM算法解决了传统的K-Means算法不能挖掘出SNP位点之间内在关系的问题,并在医院提供的临床数据实验结果表明,K-MIM/蚁群算法所筛选出的信息SNP子集,较K-Means/蚁群、MCMR、ReliefF等算法所筛选出的信息SNP子集,具有更高的非信息SNP子集重构度和更好的分类效果。
SNP data is an important genetic variation data and is one of the most important topics in the field of bioinformatics.However,due to the redundancy and noise in the SNP data,feature selection of SNP data is particularly important.In this paper,based on the problem of SNP data with fewer samples,high dimensional and strong correlation between SNPs,mutual information is introduced in K-Means clustering.Thus,an improved clustering algorithm K-MIM is proposed to better apply to SNP selection.The K-MIM algorithm solves the problem that the traditional K-Means can not mine the intrinsic relationship between SNPs.The experi⁃mental results of clinical data provided by the hospital show that the informative SNP subset selected by the K-MIM/ant colony algo⁃rithm have a better classification effect than the informative SNP subsets selected by K-Means/ant colony,MCMR,ReliefF and oth⁃er algorithms.
作者
陆信蓓
周从华
张付全
张婷
蒋跃明
LU Xinbei;ZHOU Conghua;ZHANG Fuquan;ZHANG Ting;JIANG Yueming(School of Computer Science and Telecommunication Engineering,Jiangsu University,Zhenjiang 212013;Wuxi Mental Health Center,Wuxi 214151;Wuxi MCH Hospital,Wuxi 214002;Wuxi No.5 People's Hospital,Wuxi 214073)
出处
《计算机与数字工程》
2020年第8期1943-1947,1964,共6页
Computer & Digital Engineering
基金
江苏省重点研发计划(社会发展)项目(编号:BE2016630,BE2017628)
无锡市卫生计生委科研项目(编号:z201603)资助。