摘要
针对蛋白质溶解性预测中长序列计算复杂度高以及传统模型忽略数据间差异性等问题,提出一种多输入深度学习模型FESOL。利用线性复杂度的注意力机制FAVOR+高效提取蛋白质长序列的特征信息;结合交叉熵和余弦相似度设计增强损失函数,使模型能够关注到不同输入数据间的差异性。在独立测试集上与多种先进的预测方法进行对比实验,其结果表明,FESOL在多个评价指标上均优于其它方法,验证了模型在蛋白溶解预测中的有效性。
Aiming at the problems of high computational complexity of long sequences and traditional models ignoring the diffe-rences between data in protein solubility prediction,a multi-input deep learning model FESOL was proposed.The linear complexity attention mechanism FAVOR+was used to efficiently extract the feature information of long protein sequences.The enhanced loss function was designed by combining cross entropy and cosine similarity,so that the model paid attention to the differences between different input data.Comparing experiments were carried out using a variety of advanced prediction methods on an independent test set.The results show that FESOL is superior to other methods in multiple evaluation indicators,which validates the effectiveness of the model in protein solubility prediction.
作者
杨子航
王顺芳
YANG Zi-hang;WANG Shun-fang(School of Information Science and Engineering,Yunnan University,Kunming 650504,China)
出处
《计算机工程与设计》
北大核心
2024年第2期414-419,共6页
Computer Engineering and Design
基金
国家自然科学基金项目(62062067)
云南省智能系统与计算重点实验室开放课题基金项目(ISC22Z01)。
关键词
蛋白质溶解性预测
注意力机制
损失函数
深度学习
特征融合
长序列
神经网络
protein solubility prediction
attention mechanism
loss function
deep learning
feature fusion
long sequence
neural network