期刊文献+

基于图神经网络的非同义单核苷酸多态性预测研究

Research on Predicting Non-synonymous Single Nucleotide Polymorphism Based on Graph Neural Network
下载PDF
导出
摘要 在遗传与变异的研究中,非同义单核苷酸多态性(nsSNP)是一个重要的研究方向,目前已发现由非同义单核苷酸多态性引起的疾病有6 000多种,因此,准确预测非单核苷酸多态性对更好地了解其功能机制和疾病治疗具有重要意义。针对该问题,文中提出了一种名为SGNN的模型,旨在通过图神经网络与卷积神经网络的方法,实现高性能地完成nsSNP预测任务。在SGNN模型中,通过样本长度归一化处理,截取出适当长度的残基环境,以减少冗余信息,降低噪声干扰;随后,通过ProtTrans模型提取出样本残基环境的PT特征,并将属于同种蛋白质且具有相同突变位点的样本构成的集合使用图数据建模的方法转化为图结构数据;在模型训练的过程中,通过GraphSAGE算法更新图并使用节点分类的方法结合卷积神经网络完成样本致病性预测。实验中选择MMP数据集和PredictSNP数据集作为基准数据集,并与已有的最新的方法进行对比。其中,SGNN在MMP数据集上准确率(ACC)为85.2%,在PredictSNP数据集上ACC为83.3%,相较于最新的方法分别提升了3.2百分点和3.6百分点。实验结果表明,在nsSNP预测任务中,SGNN具有更好的预测性能。 In the study of genetics and variation,non-synonymous single nucleotide polymorphisms(nsSNP) are an important research direction,and more than 6 000 diseases caused by nsSNP have been discovered.Therefore,accurate prediction of non-single nucleotide polymorphisms is of great significance for better understanding their functional mechanisms and disease treatment.A model called SGNN is proposed to address this issue,aiming to achieve high-performance prediction of nsSNP through graph neural network.In the SGNN model,the first step is to normalize the sample length and extract the appropriate length of residual environment to reduce redundant information and noise interference.Subsequently,the PT features of the sample residue environment were extracted using the ProtTrans model,and the set of samples belonging to the same protein with the same mutation site was transformed into a graph data structure using graph data modeling method.During the model training process,the GraphSAGE algorithm is used to update the graph and the vertex classification method is used to predict sample pathogenicity.In the experiment,the MMP dataset and PredictSNP dataset were selected as benchmark datasets and compared with the latest existing methods.Among them,the accuracy(ACC) of SGNN on the MMP dataset is 85.2%,and the ACC on the PredictSNP dataset is 83.3%,which is 3.2 percentage points and 3.6 percentage points higher than that of the latest method,respectively.Experiments have shown that SGNN has better predictive performance in nsSNP prediction tasks.
作者 侯孝平 张明 HOU Xiao-ping;ZHANG Ming(School of Computer,Jiangsu University of Science and Technology,Zhenjiang 212100,China)
出处 《计算机技术与发展》 2024年第11期207-213,共7页 Computer Technology and Development
基金 江苏省产学研合作项目(BY20231174)。
关键词 非同义单核苷酸多态性 ProtTrans 图神经网络 卷积神经网络 深度学习 non-synonymous single nucleotide polymorphism ProtTrans graph neural network convolutional neural network deeplearning
  • 相关文献

参考文献2

二级参考文献42

  • 1HAO DaCheng,XIAO PeiGen,CHEN ShiLin.Phenotype prediction of nonsynonymous single nucleotide polymorphisms in human phase II drug/xenobiotic metabolizing enzymes: perspectives on molecular evolution[J].Science China(Life Sciences),2010,53(10):1252-1262. 被引量:6
  • 2Smigielski E M, Sirotkin K, Ward M, et al. dbSNP: a database of single nucleotide polymorphisms [J]. Nucleic Acids Research, 2000, 28(1): 352-355. 被引量:1
  • 3Hamosh A, Scott A F, Amberger J S, et al. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders [J]. Nucleic acids research,2005,33 (suppl 1): D514-D517. 被引量:1
  • 4Stenson P D, Ball E V, Mort M, et al. Human gene mutation database (HGMD): 2003 update[J]. Human mutation, 2003,21(6):577-581. 被引量:1
  • 5Ung M U, Lu B, McCammon J A. E230Q mutation of the catalytic subunit of cAMP-dependent protein kinase affects local structure and the binding of peptide inhibitor[J]. Biopolymers,2006,81 (6): 428-439. 被引量:1
  • 6Ferrer-Costa C, Orozco M, de la Cruz X. Characterization of disease-associated single amino acid polymorphisms in terms of sequence and structure properties [J]. Journal of molecular biology, 2002,315(4): 771-786. 被引量:1
  • 7Stitziel N O, Binkowski T A, Tseng Y Y, et al. topoSNP: a topographic database of non-synonymous single nucleotide polymorp- hisms with and without known disease association [J]. Nucleic acids research,2004,32(suppl 1):D520-D522. 被引量:1
  • 8Ng P C, Henikoff S. SIFT: Predicting amino acid changes that affect protein function[J]. Nucleic acids research,2003,31 (13):3812-3814. 被引量:1
  • 9Adzhubei I A, Schmidt S, Peshkin L, et al. A method and server for predicting damaging missense mutations [J]. Nature methods,2010,7 (4):248-249. 被引量:1
  • 10Yue P, Melamud E, Moult J. SNPs3D: candidate gene and SNP selection for association studies [J]. BMC bioinformatics,2006,7(1): 166. 被引量:1

共引文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部