摘要
针对传统模型在解决不平衡数据分类问题时存在精度低、稳定性差、泛化能力弱等问题,提出基于序贯三支决策多粒度集成分类算法MGE-S3WD。采用二元关系实现粒层动态划分;根据代价矩阵计算阈值并构建多层次粒结构,将各粒层数据划分为正域、边界域和负域;将各粒层上的划分,按照正域与负域、正域与边界域、负域与边界域重新组合形成新的数据子集,并在各数据子集上构建基分类器,实现不平衡数据的集成分类。仿真结果表明,该算法能够有效降低数据子集的不平衡比,提升集成学习中基分类器的差异性,在G-mean和F-measure12个评价指标下,分类性能优于或部分优于其他集成分类算法,有效提高了分类模型的分类精度和稳定性,为不平衡数据集的集成学习提供了新的研究思路。
To address the problems of low accuracy,poor stability and weak generalization ability used in the traditional model when solving the problem of imbalanced data classification,a sequential three-way decision multi-granulation ensemble classification algorithm is proposed.A binary relationship is adopted to realize the dynamic division of the granular layer.The threshold value is calculated according to the cost matrix and a multi-layer granular structure is constructed.The data of each granular layer is divided into a positive domain,a boundary domain,and a negative domain,and the division on each granular layer is recombined according to positive and negative domains,positive and boundary domains,and negative and boundary domains to form a new data subset.A base classifier is built on each data subset to achieve the ensemble classification of imbalanced data.Simulation results show that the algorithm can effectively reduce the imbalance ratio of data subsets and improve the difference of the base classifier in ensemble learning.Under the two evaluation indexes of G-mean and F-measure1,the classification performance is better or partially better than other ensemble classification algorithms.The new algorithm effectively improves the classification accuracy and stability of the classification model,and provides new research thoughts for ensemble learning of imbalanced data sets.
作者
陈丽芳
代琪
赵佳亮
CHEN Li-fang;DAI Qi;ZHAO Jia-liang(College of Science,North China University of Science and Technology,Tangshan 063210,China)
出处
《计算机工程与科学》
CSCD
北大核心
2021年第5期917-925,共9页
Computer Engineering & Science
基金
河北省自然科学基金(F2014209086)。
关键词
序贯三支决策
多粒度
代价敏感
不平衡数据
集成学习
sequential three-way decision
multi-granularity
cost sensitive
imbalanced data
ensemble learning