摘要
正确地识别蛋白质-二磷酸鸟苷(Guanosine Diphosphate,GDP)绑定位点对于蛋白质功能分析和药物设计有非常重要的意义。蛋白质-GDP绑定位点预测是一个典型的不平衡学习问题。直接应用传统的机器学习方法是不合适的,而且会使预测结果偏向大多数类。为了解决这个问题,在基于稀疏表示的位置特异性得分矩阵特征基础上,提出了加权下采样方法来使得样本平衡,采用支持向量机算法来预测。实验结果表明提出的方法能获得更高的预测性能。
Accurately identifying the protein-GDP binding sites is of significant importance for both protein function analysisand drug design. Protein-GDP binding residues prediction is a typical imbalanced learning problem. Directly applyingthe traditional machine learning approach for this task is not suitable as the learning results will be severely biasedtowards the majority class. To circumvent this problem, on the basis of position specific scoring matrix feature based onsparse representation, weighted under-sampling is developed to make samples balanced. Finally support vector machine isused for prediction. Experimental results show that the proposed method achieves higher prediction performances.
出处
《计算机工程与应用》
CSCD
北大核心
2016年第13期55-59,75,共6页
Computer Engineering and Applications
基金
国家自然科学基金(No.61373062)
关键词
蛋白质-GDP绑定预测
位置特异性得分矩阵
稀疏表示
加权下采样
支持向量机
protein-GDP binding prediction
position specific scoring matrix
sparse representation
weighted under-sampling
support vector machine