期刊文献+

基于网络分析和随机森林方法的肝细胞癌分期研究 被引量:1

Staging Study of Hepatocellular Carcinoma Based on Network Analysis and Random Forest Method
下载PDF
导出
摘要 肝细胞癌(Hepatocellular Carcinoma, HCC)是一种侵袭性恶性肿瘤,尽管肝细胞癌诊断技术及治疗水平有了较大的进步,但对HCC的早期诊断依然是个巨大的挑战。在本文中,我们试图通过基因网络分析与临床分期相关的核心基因,用于对早期HCC患者的发现提供信息和提高HCC诊断技术及治疗水平。首先,我们选用GEO数据库中包含219例早期术后HCC患者的基因表达数据,进行差异表达分析,并且将数据随机分为训练集与测试集,其中训练集采用加权基因共表达网络(WGCNA)分析聚类出五个模块,对各基因模块进行功能富集和通路富集分析,我们发现其中blue模块与细胞增殖、分裂、周期以及DNA复制启动、复制、修复等生物过程相关,与细胞周期、P53信号通路、HTLV-I感染、乙型肝炎等通路相关,这些过程和通路均与HCC的发生发展密切相关。因此,选取模块的富集基因进行PPI网络分析,选取连通度较大的10个核心基因BUB1B、CCNA2、CCNB1、CCNB2、CDC20、MAD2L1、MCM4、PCNA、RFC4、TOP2A,通过随机森林对核心基因进行监督学习,建立BCLC分期的分类模型,然后应用于测试集,研究发现该方法对于BCLC早期患者的分类有很大程度的帮助,正确率达到95.52%,但是对于患者的中后期分类效果不是很理想。该研究提高了对HCC的发病机制和分期研究的认识,为HCC靶向治疗提供了新的方向。 Hepatocellular carcinoma (HCC) is an invasive malignanttumor. Although the diagnostic techniques and treatment levels ofhepatocellular carcinoma have made great progress, the early diagnosis of HCCis still a huge challenge. In this paper, we attempt to analyze core genes associatedwith clinical staging by gene network for information on the discovery of earlyHCC patients and improving the diagnostic techniques and treatment levels ofHCC. First, we selected the gene expression data of 219 patients with earlypostoperative HCC in the GEO database, performed differential expressionanalysis, and randomly divided the data into training set and test set. We usethe genes of training set to clustering out five modules by weighted geneco-expression network (WGCNA), and performed functional enrichment and pathwayenrichment analysis for each gene module. We found that the blue module isrelated to some biological processes such as cell proliferation, division,cycle and DNA replication initiation, replication, repair, and this module isalso related to some pathways such as cell cycle, P53 signaling pathway, HTLV-Iinfection, hepatitis B. These processes and pathways are closely related to theoccurrence and development of HCC. Therefore, we use the enriched genes of themodule for PPI network analysis, and 10 core genes that we selected with highconnectivity is BUB1B, CCNA2, CCNB1, CCNB2, CDC20, MAD2L1, MCM4, PCNA, RFC4,and TOP2A. Then through the supervised learning of core genes in randomforests, a classification model of BCLC staging was established and thenapplied to the test set. The study found that the method has a great help forthe classification of early patients, and the correct rate reached 95.52%, butfor the patients in the middle and late stages. The classification effect isnot very good. This study raises awareness of the pathogenesis and staging ofHCC. And it provides a new direction for HCC targeted therapy.
作者 李鑫
机构地区 华北电力大学
出处 《统计学与应用》 2019年第1期95-107,共13页 Statistical and Application
  • 相关文献

同被引文献3

引证文献1

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部