摘要
随机森林算法作为经典的分类算法,应用广泛,分类的准确度高。但在分类的过程之中,各个决策树的分类性能和两两决策树之间的差异性是影响最终分类效果的两个重要因素,当部分决策树有相似的错误分类情况,在最终利用决策树的结果进行投票时,将降低模型最终的分类效果。针对该问题,本文将误差矩阵引入分类树的相似性度量当中。该方法考虑了不同类别的树的数量、分类正确错误的情况,以便选出相似度弱的决策树,然后,剔除分类能力差的决策树,最终选择出分类能力强的分类器集合。实验结果显示,本文提出的方法在3类数据集中,平均分类正确率高于原算法,且稳定性更高。
As a classic classification algorithm, random forest algorithm is widely used and has high classification accuracy. However, in the process of classification, the classification performance of each decision tree and the difference between two decision trees are two important factors that affect the final classification effect. When some decision trees have similar misclassifications, and they are used in the final voting on the results of the decision tree, the final classification effect of the model will be reduced. Aiming at this problem, this paper proposes a method for measuring the similarity of decision trees based on confusion Matrix. This method takes into account the number of different categories of trees and the correct and incorrect classification, in order to select decision trees with weak similarity, and then remove the decision trees with poor classification results, and finally complete the model selection of random forest. Experimental results show that the method pro-posed in this paper has a higher average classification accuracy rate and higher stability in the three types of datasets.
出处
《计算机科学与应用》
2020年第9期1541-1548,共8页
Computer Science and Application
关键词
集成分类器
随机森林
误差矩阵
Integrated Classifier
Random Forest
Confusion Matrix