期刊文献+

一种面向SNP选择的模糊聚类算法 被引量:3

A Fuzzy Clustering Algorithm for SNP Selection
下载PDF
导出
摘要 在对高维少样本的遗传数据进行单核苷酸多态性(SNP)选择时,为能使所选SNP子集高度代表所有SNP信息,实现数据降维,在模糊C均值(FCM)算法的基础上提出一种改进方法GN-FCM。通过引入SNP权重因子量化SNP位点重要程度的差异性,同时将重点SNP邻域正则项引入模糊聚类的损失函数中,挖掘高度重要SNP与同邻域内其他SNP的关联性。实验结果表明,GN-FCM具有较好的收敛性,与DW-FCM算法相比,其构造的SNP子集在支持向量机、决策树和朴素贝叶斯分类中准确率分别提升5.73 %、3.40 %和3.79 %,F1值分别提升4.01 %、 3.20 %和 2.22 %。 In the selection of Single Nucleotide Polymorphism(SNP) from high-dimensional genetic data with few samples,in order to make the selected SNP subset highly represent all SNP information and achieve data dimension reduction,an improved method is proposed on the basis of Fuzzy C-Mean(FCM) algorithm,which is named GN-FCM.By introducing the weight factor of SNP,the difference of importance degree of SNP site is quantified.Meanwhile,the neighborhood regular term of key SNP is introduced into the loss function of fuzzy clustering,so as to mine the correlation between highly important SNP and other SNPs in the neighborhood.Experimental results show that GN-FCM has better convergence.Compared with DW-FCM algorithm,the accuracy of the constructed SNP subsets by this algorithm in Support Vector Machine(SVM),Decision Tree(DT) and Na ve Bayesian(NB) classification is improved by 5.73 %, 3.40 % and 3.79 % respectively,and the F1 value is improved by 4.01 %,3.20 % and 2.22 % respectively.
作者 张波 周从华 张付全 张婷 蒋跃明 ZHANG Bo;ZHOU Conghua;ZHANG Fuquan;ZHANG Ting;JIANG Yueming(School of Computer Science and Communication Engineering,Jiangsu University,Zhenjiang,Jiangsu 212013,China;Wuxi Mental Health Center,Wuxi,Jiangsu 214151,China;Wuxi Hospital for Maternity and Child Health Care Hospital,Wuxi,Jiangsu 214002,China;Wuxi No.5 People’s Hospital,Wuxi,Jiangsu 214073,China)
出处 《计算机工程》 CAS CSCD 北大核心 2019年第8期66-74,共9页 Computer Engineering
基金 江苏省重点研发计划社会发展项目(BE2016630,BE2017628) 无锡市卫生计生委科研项目(Z201603)
关键词 单核苷酸多态性选择 模糊聚类 特征选择 支持向量机 决策树 朴素贝叶斯分类 Single Nucleotide Polymorphism(SNP) selection fuzzy clustering feature selection Support Vector Machine(SVM) Decision Tree(DT) Na ve Bayesian(NB) classification
  • 相关文献

参考文献4

二级参考文献33

  • 1Li Chao-shun,Zhou Jian-zhong,and Li Qing-qing.A fuzzy clustering algorithm based on mutative scale chaos optimization.Advances in Neural Networks.ISNN 2008,Berlin/Heidelberg:Springer.2008,5264:259-267. 被引量:1
  • 2Runkler T A and Katz C.Fuzzy clustering by particle swarm optimization.Proceedings of 2006 IEEE International Conference on Fuzzy Systems.Vancouver,BC,2006:601-608. 被引量:1
  • 3Chuang Keh-shih,Tzeng Hong-long,and Chen Sharon.Fuzzy c-means clustering with spatial information for image segmentation.Computerized Medical Imaging and Graphics.2006,30(1):9-15. 被引量:1
  • 4Cai Wei-ling,Chen Song-can,and Zhang Dao-qiang.Fast and robust fuzzy c-means clustering algorithms incorporating local information for image segmentation.Pattern Recognition,2007,40(3):825-838. 被引量:1
  • 5Pal N R and Bezdek J C.On cluster validity for the Fuzzy c-means Model.1EEE Transactions on Fuzzy Systems.1995,3(3):370-378. 被引量:1
  • 6Kamber M and Han Jia-wei.Data Mining:Concepts and Techniques.2rd edition.Singapore:Elsevier Press.2005:295-300. 被引量:1
  • 7Breunig M M,Kriegel Hans-peter,and Raymond T N,et al..LOF:Identifying density-based local outliers.Proceedings of ACM SIGMOD International Conference on Management of Data,Dallas,Texas:ACM Press.2000,29:93-104. 被引量:1
  • 8Cao Hui,Si Gang-quan,Zhu Wen-zhi,and Zhang Yan-bin.Enhancing effectiveness of deusity-based outlier mining.International Symposiums on Information processing,Moscow,May 23-25,2008. 被引量:1
  • 9Ghoting A,Parthasarathy S,and Otey M E.Fast miniug of distance-based outliers in high-dimensional dataset.Data Mining Knowledge Discovery,2008,16(3):349-364. 被引量:1
  • 10Weng Xiao-qing and Shen Jun-yi.Detecting outlier samples in multivariate time series dataset.Knowledge-Based Systems,2008,21(8):807-812. 被引量:1

共引文献36

同被引文献29

引证文献3

二级引证文献23

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部