期刊文献+

基于深度学习的蛋白质亚细胞定位预测 被引量:3

Prediction of protein subcellular localization based on deep learning
下载PDF
导出
摘要 针对传统机器学习算法中仍需手工操作表示特征的问题,提出了一种基于堆栈式降噪自编码器(SDAE)深度网络的蛋白质亚细胞定位算法。首先,分别利用改进型伪氨基酸组成法(PseAAC)、伪位置特异性得分矩阵法(PsePSSM)和三联体编码法(CT)对蛋白质序列进行特征提取,并将这三种方法得到的特征向量进行融合,以得到一个全新的蛋白质序列特征表达模型;接着,将融合后的特征向量输入到SDAE深度网络里自动学习更有效的特征表示;然后选用Softmax回归分类器进行亚细胞的分类预测,并采用留一法在Viral proteins和Plant proteins两个数据集上进行交叉验证;最后,将所提算法的结果与mGOASVM、HybridGO-Loc等多种现有算法的结果进行比较。实验结果表明,所提算法在Viral proteins数据集上取得了98.24%的准确率,与mGOASVM算法相比提高了9.35个百分点;同时所提算法在Plant proteins数据集上取得了97.63%的准确率,比mGOASVM算法和HybridGO-Loc算法分别提高了10.21个百分点和4.07个百分点。综上说明所提算法可以有效提高蛋白质亚细胞定位预测的准确性。 Focused on the issue that traditional machine learning algorithms still need to manually represent features,a protein subcellular localization algorithm based on the deep network of Stacked Denoising AutoEncoder(SDAE)was proposed.Firstly,the improved Pseudo-Amino Acid Composition(PseAAC),Pseudo Position Specific Scoring Matrix(PsePSSM)and Conjoint Traid(CT)were used to extract the features of the protein sequence respectively,and the feature vectors obtained by these three methods were fused to obtain a new feature expression model of protein sequence.Secondly,the fused feature vector was input into the SDAE deep network to automatically learn more effective feature representation.Thirdly,the Softmax regression classifier was adopted to make the classification and prediction of subcells,and leave-oneout cross validation was performed on Viral proteins and Plant proteins datasets.Finally,the results of the proposed algorithm were compared with those of the existing algorithms such as mGOASVM(multi-label protein subcellular localization based on Gene Ontology and Support Vector Machine)and HybridGO-Loc(mining Hybrid features on Gene Ontology for predicting subcellular Localization of multi-location proteins).Experimental results show that the new algorithm achieves 98.24%accuracy on Viral proteins dataset,which is 9.35 Percentage Points higher than that of mGOASVM algorithm.And the new algorithm achieves 97.63%accuracy on Plant proteins dataset,which is 10.21 percentage points and 4.07 percentage points higher than those of mGOASVM algorithm and HybridGO-Loc algorithm respectively.To sum up,it can be shown that the proposed new algorithm can effectively improve the accuracy of the prediction of protein subcellular localization.
作者 王艺皓 丁洪伟 李波 保利勇 张颖婕 WANG Yihao;DING Hongwei;LI Bo;BAO Liyong;ZHANG Yingjie(School of Information Science and Engineering,Yunnan University,Kunming Yunnan 650500,China)
出处 《计算机应用》 CSCD 北大核心 2020年第11期3393-3399,共7页 journal of Computer Applications
基金 国家自然科学基金资助项目(61461053,61461054)。
关键词 深度学习 特征融合 蛋白质定位 堆栈式降噪自编码器 留一法 deep learning feature fusion protein localization Stacked Denoising AutoEncoder(SDAE) leave-oneout cross validation
  • 相关文献

参考文献1

  • 1张颖婕..基于支持向量机的蛋白质序列信息提取及亚细胞定位研究[D].云南大学,2019:

同被引文献21

引证文献3

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部