摘要
运用浙江省肿瘤医院提供的乳腺癌临床SELDI-TOF质谱数据,依据临床TNM分期,探索肿瘤大小、淋巴结受累情况在蛋白质质谱数据表达中的差异。对预处理后的质谱数据运用近邻传播聚类和零空间LDA算法进行特征选择,再用SVM-RFE算法选择相关生物标志物,最后分类测试并统计分析所挑选的生物标志物是否体现样本间差异。结果显示,通过四个组TNM分期样本对比实验,各获得35个相关差异生物标志物,能够得到较好的分类结果,部分生物标志物P值小于0.05。实验结果说明肿瘤大小和淋巴结受累情况的差异能够在蛋白质水平表达。
In this work,we explore the association between the expressions of protein mass spectrometry data and the size of the tumor and lymph node involvement based on the clinical TNM staging by using the clinical SELDI- TOF mass spectrometry data from Zhejiang Province Tumor Hospital. These data of mass spectrum were preprocessed to reduce the noise generated in the first stage.Then,features were selected and redundant features were reduced when applying the clustering algorithm of affinity propagation and null space LDA. The SVM- RFE was performed for selecting the significant biomarkers. Finally,the classification test and statistical analysis were implemented to test the differences between the samples that containing the biomarkers. The experiments of each group by TNM staging showed 35 significant biomarkers with P 0. 05. The results indicate that the data of protein mass spectrum can highly correlate with the size of the tumor and the lymph node involvement.
出处
《生物医学工程研究》
北大核心
2015年第1期7-10,共4页
Journal Of Biomedical Engineering Research
基金
国家自然科学基金资助项目(61271063)
国家重点基础研究发展计划项目(2013CB329502)
浙江省自然科学基金资助项目(LQ14F010011)
浙江省科技厅项目(2005C30001
2012C13019-1
2013C332-05)
关键词
乳腺癌
蛋白质质谱
TNM分期
特征选择
生物标志物
Breast cancer
Protein mass spectrometry
TNM staging
Feature selection
Biomarkers