摘要
目前国内外对于DNA-蛋白质绑定位点预测的研究大多集中在仅以蛋白质序列信息或仅以蛋白质结构信息为基础进行计算,而二者结合所实现的预测效果较差。本文提出一种在蛋白质位置特异性得分矩阵序列特征的基础上,结合蛋白质残基的溶剂可及表面积、相对表面积、深度和突出指数这几个结合效果良好的结构特征的DNA与蛋白质绑定位点预测方法,并使用随机下采样方法解决训练集样本不平衡问题,最后使用支持向量机算法进行预测。实验结果表明,本文方法具有较好的预测能力。
Most of the research of DNA-protein binding sites are focusing on just computing protein sequence information or struc- ture information, while the results are terrible if combing this two information, no matter what at home or abroad. To solve this problem, we combine protein structure information of accessible surface area, relative solvent accessibility, depth index and pro- trusion index with protein sequence information of position specific scoring matrix to predict DNA-Protein binding sites. Then we use under sampling to solve the unbalance problem of training dataset. Finally, we use support vector machine to make predic- tion. The result of experiment shows the method that we proposed can achieve better performance in prediction.
出处
《计算机与现代化》
2016年第1期20-25,共6页
Computer and Modernization
关键词
位置特异性得分矩阵
可及表面积
相对表面积
深度与突出指数
随机下采样
支持向量机
position specific scoring matrix
accessible surface area
relative solvent accessibility
depth index and protrusionindex
under sampling
support vector machine