期刊文献+

基于机器学习的胰腺癌特征基因筛选初步研究 被引量:1

Machine learning-based feature gene screening of pancreatic cancer
原文传递
导出
摘要 背景与目的:胰腺癌是一种难治的癌症,90%以上的患者在诊断后1年内死亡。胰腺癌病变组织和正常组织之间存在差异表达基因(DEGs)可能与胰腺癌的发生和发展密切相关。本研究运用机器学习方法对胰腺癌DEGs进行筛选,以期为研究该病的发生机制提供依据。方法:从公共基因GEO数据库中筛选胰腺癌基因表达谱,使用线性回归模型软件包Limma对不同组的芯片进行差异性计算,归一化;使用R语言获得DEGs,对筛选出来的DEGs特征选择方法进一步进行筛选;基于获得的核心DEGs,采用AdaBoost和Bagging算法分别构建胰腺癌预测模型。用DAVID网站对核心DEGs进行GO功能分析和KEGG通路富集分析,再用STRING网站及Cytscape软件对核心DEGs进行蛋白-蛋白相互作用(PPI)网络分析,最后用GEPIA网站对预后相关的核心DEGs行生存分析。结果:通过特征筛选,得到了18个关键的DEGs;以该18个DEGs建立特征子集,结合AdaBoost算法建立了预测模型,预报准确率可以达到92.3%。通过对DEGs的GO和KEGG分析,发现CDK1、CCNA2和CCNB1有间接作用,对胰腺癌的形成和发展有一定的作用。生存分析显示,CDK1 (P=0.0008)、CCNB1 (P=0.012)、CSK2 (P=0.023)、CKS1B (P=0.001 3)的表达量与患者总生存期(OS)有相关性,它们的表达量越高,患者OS越短。结论:机器学习方法可较好地对胰腺癌特征基因进行筛选,对胰腺癌的诊治及相关的药物开发具有一定意义。 Background and Aims:Pancreatic cancer is a difficult-to-treat disease and over 90% of the patients will die within one year of diagnosis.The presence of differentially expressed genes(DEGs) between diseased and normal pancreatic cancer tissues may closely associated with the development and progression of pancreatic cancer.This study was conducted to screen the DEGs in pancreatic cancer using a machine learning approach,so as to provide a basis for studying the pathogenetic mechanism of this disease.Methods:Pancreatic cancer gene expression profiles were screened from the public gene GEO database,differential calculations and normalizations were performed using the linear regression model package Limma for different groups of microarrays.The DEGs were obtained using the R language,and the selected DEGs were further screened by correlation-based feature selection method.Based on the hub DEGs obtained,Ada Boost and Bagging algorithms were used to construct pancreatic cancer prediction models respectively.The GO function analysis and KEGG enrichment analysis of the hub DEGs were performed through the DAVID website,and protein-protein interaction(PPI) network of the hub DEGs was analyzed using STRING database and Cytscape software.Finally,survival analysis was performed on the relevant hub DEGs through the GEPIA website.Results:Through feature screening,18 key DEGs were obtained.A prediction model was built by using Ada Boost algorithm based on the feature subset containing the 18 DEGs,and the prediction accuracy reached 92.3%.The GO and KEGG analysis of the DEGs revealed an indirect role for CDK1,CCNA2 and CCNB1 in the formation and development of pancreatic cancer.Survival analysis showed that the expressions of CDK1(P=0.000 8),CCNB1(P=0.012),CSK2(P=0.023) and CKS1 B(P=0.001 3) were correlated with the overall survival(OS) of patients,and higher expressions of them were associated with shorter OS of patients.Conclusion:Machine learning methods can be efficiently applied for hub genes screening in pancreatic cancer,
作者 魏伟 欧政林 窦晓淋 张帅 唐翎 WEI Wei;OU Zhenglin;DOU Xiaolin;ZHANG Shuai;TANG Ling(Department of General Surgery,Changsha 410008,China;Department of Pharmacy,Changsha 410008,China;National Clinical Research Center for Geriatric Disorders,Xiangya Hospital,Central South University,Changsha 410008,China)
出处 《中国普通外科杂志》 CAS CSCD 北大核心 2022年第9期1203-1209,共7页 China Journal of General Surgery
基金 湖南省自然科学基金资助项目(2019JJ40489)。
关键词 胰腺肿瘤 基因表达谱 机器学习 计算生物学 Pancreatic Neoplasms Gene Expression Profiling Machine Learning Computational Biology
  • 相关文献

参考文献6

二级参考文献18

共引文献23

同被引文献2

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部