摘要
传统的多标签学习算法一般没有考虑标签的不均衡性,从而忽略了标签不平衡给分类带来的影响。但统计发现,目前常用的多标签数据集均存在标签不均衡问题,且少数类标签往往更加重要。基于此,本文提出了一种基于分类间隔增强的不平衡多标签学习算法(Imbalanced multi-label learning algorithm based on classification interval enhanced,MLCIE),旨在利用各标签分类间隔的重构来增强分类器对少数类标签样本的学习效率,提升样本标签质量,从而减少多标签不平衡对分类器学习精度的影响。首先利用各标签密度与条件熵计算各标签的不确定性系数;然后构建分类间隔增强矩阵,将各标签独有的密度信息融入到原始标签矩阵中,获取平衡的标签空间;最后使用极限学习机作为线性分类器进行分类。本文在11个多标签标准数据集上与其他7种多标签学习算法进行对比实验,结果表明本文算法在解决标签不平衡问题上有一定效果。
Traditional multi-label learning algorithms generally do not consider the label imbalance,so the impact of label imbalance on classification is not ignored.However,statistics show that the current multilabel datasets have the problem of label imbalance,and a few kinds of labels are often more important.Based on this,this paper proposes an imbalanced multi-label learning algorithm based on classification interval enhanced(MLCIE),which aims to enhance the learning efficiency and improve the quality of the sample label by using the reconstruction of each label classification interval,so as to reduce the impact of multi-label imbalance on the learning accuracy of the classifier.Firstly,the uncertainty coefficient of each label is calculated by using the density and conditional entropy of each label;Then the enhancement matrix of classification interval is constructed,so that the unique density information of each label is integrated into the original label matrix to obtain the balanced label space;Finally,the limit learning machine is used as the linear classifier for classification.In this paper,the proposed algorithm is compared with other seven multi-label learning algorithms on the 11 multi-label standard datasets.The results show that the proposed algorithm can solve the problem of label imbalance.
作者
程玉胜
曹天成
CHENG Yusheng;CAO Tiancheng(University Key Laboratory of Intelligent Perception and Computing of Anhui Province(Anqing Normal University),Anqing 246133,China;Innovation Team of Anqing Normal University,Anqing 246133,China)
出处
《数据采集与处理》
CSCD
北大核心
2021年第3期519-528,共10页
Journal of Data Acquisition and Processing
关键词
多标签学习
标签不平衡
分类间隔
标签密度
极限学习机
multi-label learning
label imbalance
classification interval
label density
extreme learning machine