摘要
在遗传与变异的研究中,非同义单核苷酸多态性(nsSNP)是一个重要的研究方向,目前已发现由非同义单核苷酸多态性引起的疾病有6 000多种,因此,准确预测非单核苷酸多态性对更好地了解其功能机制和疾病治疗具有重要意义。针对该问题,文中提出了一种名为SGNN的模型,旨在通过图神经网络与卷积神经网络的方法,实现高性能地完成nsSNP预测任务。在SGNN模型中,通过样本长度归一化处理,截取出适当长度的残基环境,以减少冗余信息,降低噪声干扰;随后,通过ProtTrans模型提取出样本残基环境的PT特征,并将属于同种蛋白质且具有相同突变位点的样本构成的集合使用图数据建模的方法转化为图结构数据;在模型训练的过程中,通过GraphSAGE算法更新图并使用节点分类的方法结合卷积神经网络完成样本致病性预测。实验中选择MMP数据集和PredictSNP数据集作为基准数据集,并与已有的最新的方法进行对比。其中,SGNN在MMP数据集上准确率(ACC)为85.2%,在PredictSNP数据集上ACC为83.3%,相较于最新的方法分别提升了3.2百分点和3.6百分点。实验结果表明,在nsSNP预测任务中,SGNN具有更好的预测性能。
In the study of genetics and variation,non-synonymous single nucleotide polymorphisms(nsSNP) are an important research direction,and more than 6 000 diseases caused by nsSNP have been discovered.Therefore,accurate prediction of non-single nucleotide polymorphisms is of great significance for better understanding their functional mechanisms and disease treatment.A model called SGNN is proposed to address this issue,aiming to achieve high-performance prediction of nsSNP through graph neural network.In the SGNN model,the first step is to normalize the sample length and extract the appropriate length of residual environment to reduce redundant information and noise interference.Subsequently,the PT features of the sample residue environment were extracted using the ProtTrans model,and the set of samples belonging to the same protein with the same mutation site was transformed into a graph data structure using graph data modeling method.During the model training process,the GraphSAGE algorithm is used to update the graph and the vertex classification method is used to predict sample pathogenicity.In the experiment,the MMP dataset and PredictSNP dataset were selected as benchmark datasets and compared with the latest existing methods.Among them,the accuracy(ACC) of SGNN on the MMP dataset is 85.2%,and the ACC on the PredictSNP dataset is 83.3%,which is 3.2 percentage points and 3.6 percentage points higher than that of the latest method,respectively.Experiments have shown that SGNN has better predictive performance in nsSNP prediction tasks.
作者
侯孝平
张明
HOU Xiao-ping;ZHANG Ming(School of Computer,Jiangsu University of Science and Technology,Zhenjiang 212100,China)
出处
《计算机技术与发展》
2024年第11期207-213,共7页
Computer Technology and Development
基金
江苏省产学研合作项目(BY20231174)。