摘要
在蛋白质与维生素绑定位点预测问题中,小类样本和大类样本之间存在显著的不平衡性,传统的机器学习方法将不再适用。针对此问题,在多重随机下采样的基础上结合支持向量机(SVM)集成来预测蛋白质与维生素的绑定位点,采用了一种改进的Ada Boost集成方法,称为MAda Boost集成。通过实验比较了不同的集成策略,其中MAda Boost集成效果最优。实验结果表明,采用随机下采样结合SVM集成将有效提高蛋白质维生素绑定位点预测的精度。
Since the obvious imbalance exists between small samples and large samples in protein-vitamin binding site pre-diction problem,the traditional machine learning approach is not suitable for this problem. To tackle this problem,protein-vita?min binding site is predict by combining multiple random sampling with SVM ensemble,an improved AdaBoost algorithm which is called MAdaBoost ensemble is adopted. Different ensemble strategies are compared by experiments,the MAdaBoost ensemble strategy is optimal. The experimental results show that the accuracy of protein-vitamin binding site prediction is improved by ap-plying random sampling with SVM ensemble method.
出处
《现代电子技术》
北大核心
2015年第9期90-95,共6页
Modern Electronics Technique
基金
江苏省自然科学基金-面上项目:面向蛋白质生物计算的特征抽取及动态学习模型研究(BK20141403)