期刊文献+

偏最小二乘判别分析结合F-score用于蛋白质组学质谱数据的特征筛选(英文) 被引量:2

Feature selection from proteomic mass spectrometric data using F-score and partial least square-discriminant analysis
原文传递
导出
摘要 提出了一种基于偏最小二乘判别分析和F-score的特征筛选方法,并将其用于蛋白质组学质谱数据分析。该方法主要包含3个步骤:(1)用LIMPIC算法对原始数据进行预处理;(2)计算每个变量的F-score值并将所有变量按F-score值降底的顺序排列;(3)采用偏最小二乘判别分析交互检验按前向选择法选择最佳变量子集。用本方法对一组结肠癌数据进行分析,最终从原始的16331个质荷比变量中选择了8个特征质荷比作为潜在的生物标记物。用所选择的特征对独立测试集的样本进行判别,其灵敏度和特异性分别达到了95.24%和100%。结果表明,所提出的方法可用于蛋白质组学质谱数据的特征筛选及样本分类。 A feature selection and sample classification method based on F-score and partial least square discriminant analysis (PLS-DA) was proposed and used for proteomic mass spectrometric (MS) data analysis and potential biomarker discovery. The method mainly includes 3 steps: (1) spectra preprocessing with LIMPIC algorithm; (2) calculating the F-score values for each variable and sort them according to their F-score values in descending order; and (3) determination of the optimum feature set with PLS-DA cross validation in a forward stepwise selection manner. A colorectal cancer dataset was analyzed with the proposed method. As results, 8 m/z locations were selected as potential biomarkers. The features could distinguish the disease samples from healthy controls on the independent test sets with 95.24% of sensitivity and 100% of specificity. The results show that the method proposed in this study is available for classification feature selection from proteomic MS data.
出处 《计算机与应用化学》 CAS CSCD 北大核心 2012年第12期1467-1470,共4页 Computers and Applied Chemistry
关键词 特征选择 质谱 F-score 偏最小二乘判别分析 feature selection, mass spectra, F-score, partial least square-discriminant analysis
  • 相关文献

参考文献1

二级参考文献11

  • 1LEE M C.Using support vector machine with a hybrid feature selection method to the stock trend prediction[J].Expert Systems with Applications,2009,36(8):10896-10904. 被引量:1
  • 2MALDONADO S,WEBER R.A wrapper method for feature selection using support machines[J].Information Sciences,2009,179(13):2208-2217. 被引量:1
  • 3LIU Y,ZHENG Y F.FS_SFS:A novel feature selection method for support vector machines[J].Pattern Recognition,2006,39 (7):1333-1345. 被引量:1
  • 4HUA J P,TEMBE W D,DOUGHERTY E R.Performance of feature-selection methods in the classification of high-dimensian data[J].Pattern Recognition,2009,42(3):409-424. 被引量:1
  • 5GUNAL S,GEREK O N,ECE D G,et al.The search for optimal feature set in power quality event classification[J].Expert Systems with Applications,2009,36(7):10266-10273. 被引量:1
  • 6WIDODO A,YANG B S.Application of nonlinear feature extraction and support vector machines for fault diagnosis of induction motors[J].Expert Systems with Applications,2007,33(1):241-250. 被引量:1
  • 7GUYON I,ELISSEEFF A.An introduction to variable and feature selection[J].Machine Learning Research,2003,3:1157-1182. 被引量:1
  • 8TALAVERA L.An evaluation of filter and wrapper methods for feature selection in categorical clustering[C]// Proceedings of 6th International Symposium on Intelligent Data Analysis.Madrid:Springer,2005:440-451. 被引量:1
  • 9CHEN Y W,LIN C J Combining SVMs with various feature selection strategies[EB/OL].[2009-08-10].http://www.csie.ntu.edu.tw/-cjlin/papere/features.pdf. 被引量:1
  • 10VAPNIK V N.The nature of statistical learning theory[M].New York:Springer,1995. 被引量:1

共引文献30

同被引文献11

引证文献2

二级引证文献24

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部