期刊文献+

基于T检验与支持向量机的蛋白质质谱数据分析 被引量:1

Protein Mass Spectrometry Data Analysis Based on T test and Support Vector Machine
下载PDF
导出
摘要 对蛋白质质谱数据进行模式识别成为癌症诊断的一种新方法,但质谱数据存在高维小样本问题,因而数据分析面临着巨大挑战.在对原始数据进行基线校正与标准化并用分箱法进行降维预处理的基础上,提出用T检验方法选取特征,对蛋白质质谱数据进行分析研究.实验对卵巢质谱数据集进行分类,用10-fold交叉验证法选择训练和测试样本,以支持向量机为分类器,实验结果表明提出的方法不仅选取特征子集小而且识别率高,其敏感性、特异性和综合识别率分别达到100%、96.7%和98.8%. The pattern analysis to protein mass spectrometry data becomes a new method of cancer diagnosis.But there exists high dimensional and small sample size problem in protein mass spectrometry data,which brings a big challenge to data analysis.Based on dimension reduction preprocessing to raw data by using baseline correction and binning standardization,propose T test to select features to analysis protein mass spectrometry data.In the experiment classify ovarian mass dataset,use 10-fold cross validation to get training and testing data and use SVM as the classifier,the results shows the method propose only selects a small feature subset,and have a very high recognition rate.Its Sensitivity,specificity,and overall recognition rate has reached 100%,96.7% and 98.8%.
出处 《淮阴师范学院学报(自然科学版)》 CAS 2011年第5期409-413,共5页 Journal of Huaiyin Teachers College;Natural Science Edition
关键词 蛋白质质谱 分箱法 T-检验 支持向量机 protein mass spectrometry binning T-test support vector machine
  • 相关文献

参考文献10

  • 1Dudoit J S, Fridlyand J, Speed T P. Comparison of discrimination methods for the classification of tumors using gene expression data [J]. Journal of the American Statistical Association, 2002,97(457) :77 - 87. 被引量:1
  • 2孟辉,洪文学.蛋白质组学质谱数据预处理技术综述[J].中国生物医学工程学报,2009,28(3):469-475. 被引量:9
  • 3Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response [J]. PNAS, 2001,98(9) :5116- 5121. 被引量:1
  • 4陈主初,肖志强主编..疾病蛋白质组学[M].北京:化学工业出版社,2006:319.
  • 5孟范静,刘毅慧,王洪国,成金勇.SVM在基因微阵列癌症数据分类中的应用[J].计算机工程与应用,2007,43(34):246-248. 被引量:2
  • 6潘义,郑波,向杰,文志宁,刁元波,李梦龙.遗传算法-偏最小二乘法用于卵巢癌血清蛋白质组数据的特征挑选[J].四川大学学报(自然科学版),2007,44(4):867-872. 被引量:2
  • 7Yu J S, Onagello S, Fiedler R, et al. Ovarian cancer identification based on dimensionality reduction for high-throughput mass spectrometry data [ J]. Bioinfor-matics, 2005,21(10) : 2200 - 2208. 被引量:1
  • 8Smith R, Cokkinides V, Eyre H. American cancer society guidelines for the early detection of cancer [ J ]. CA Cancer J Clin, 2003, 53 (1), : 27 - 43. 被引量:1
  • 9Yu J S, Chen X W. Bayesian neural network approaches to ovarian cancer identification from high-resolution mass spectrometry data [J]. Bioinformatics,2005, 21(1) :487 - 494. 被引量:1
  • 10Cui X, Churchill G A. Statistical tests for differential expression in cDNA microarray experiments [J]. Genome Biology, 2003,44 (4) :210- 218. 被引量:1

二级参考文献71

共引文献9

同被引文献17

  • 1SCHAPIRE R, FREUND Y, BARTLETr P, WEE SUNL. Boosting the margin: a new explanation for the effectiveness of voting methods [ J ]. The' Annals of Statistics, 1988, 26(5) : 1651-1686. 被引量:1
  • 2KUEHL B, MARTEN S, BISCHOFF Y, et al. MALDI- ToF mass spectrometry-multivariate data analysis as a tool for classification of reactivation and non-culturable states of bacteria[J]. Anal Bioanal Chem, 2011, 401: 1593- 1600. 被引量:1
  • 3EBERLIN L, NORTON I, DILL A, et al. Classifying human brain tumors by lipid imaging with mass spectrometry[J]. Cancer Research, 2012, 72: 645-654. 被引量:1
  • 4BEHDAD M, FRENCH T, BARONE L, et al. On principal component analysis for high-dimensional XCSR [ J ]. Evolutionary Intelligence, 2012, 5 (2) : 129- i 38. 被引量:1
  • 5Baldi P, Long A. A Bayesian framework for the analysis of microarray expression data:regularized t-test and statistical inferences of gene changes[ J ]. Bionformatics, 200i, 17: 509-519. 被引量:1
  • 6ZHAO J. Asymptotic convergence of dimension reduction based boosting in classification [ J ]. Journal of StatisticalPlanning and Inference, 2013, 143(4) : 651-662. 被引量:1
  • 7LIU Yihui. Feature extraction and dimensionality reduction for mass spectrometry data[ A]. Computers in Biology and Medicine, 2009, 39: 818-823. 被引量:1
  • 8张德丰.MATLAB小波分析(第二版)[M].北京:机械工业出版社.2011. 被引量:2
  • 9GELADI P. Notes on the history and nature of partial least squares (PLS) modeling [ J ]. Journal of Chemometrics, 1988, 2: 231-246. 被引量:1
  • 10边肇祺,张学工.模式识别(第二版)[M].北京:清华大学出版社.2003. 被引量:1

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部