摘要
如何构造差异性大的基分类器是集成学习研究的重点,为此提出迭代循环选择法:以最大化正则互信息为准则提取最优特征子集,进而基于此训练得到基分类器;同时以错分样本个数作为差异性度量准则来评价所得基分类器的性能,若满足条件则停止,反之则循环迭代直至结束。最后用加权投票法融合所选基分类器的识别结果。通过仿真实验验证算法的有效性,以支持向量机为分类器,在公共数据集UCI上进行实验,并与单SVM及经典的Bagging集成算法和特征Bagging集成算法进行对比。实验结果显示,该方法可获得较高的分类精度。
How to generate classifiers with higher diversity is an important problem in ensemble learning, consequently, an iterative algorithm was proposed as follows:base classifier is trained using optimal feature subset which is selected by maximum normalized mutual information, simultaneously, the attained base classifier is measured by the diversity based on the number of miss classified samples. The algorithm stops if satisfy, otherwise iterates until end. Finally, weighted voting method is utilized to fusion the base classifiers recognition results. To attest the validity, we made ex- periments on UCI data sets with support vector machine as the classifier, and compared it with Single-SVM, Bagging- SVM and AB-SVM. Experimental results suggest that our algorithm can get higher classification accuracy.
出处
《计算机科学》
CSCD
北大核心
2013年第6期225-228,共4页
Computer Science
基金
国家自然科学基金项目(60975026,61273275)资助
关键词
集成学习
集成特征选择
互信息
差异性
Ensemble learning, Ensemble feature selection, Mutual Information, Diversity