摘要
主动学习旨在通过人机交互减少专家标注,代价敏感主动学习则致力于平衡标注与误分类代价。基于三支决策(3WD)和标签均匀分布(LUD)模型,提出一种基于最远总距离采样的代价敏感主动学习算法(CAFS)。首先,设计了最远总距离采样策略,以查询代表性样本的标签;其次,利用了LUD模型和代价函数,计算期望采样数目;最后,使用了k-Means聚类技术分裂已获得不同标签的块。CAFS算法利用三支决策思想迭代地进行标签查询、实例预测和块分裂,直至处理完所有实例。学习过程在代价最小化目标的控制下进行。在9个公开数据上比较,CAFS比11个主流的算法具有更低的平均代价。
Active learning aims to reduce expert labeling through man-machine interaction, while cost-sensitive active learning focuses on balancing labeling and misclassification costs. Based on Three-Way Decision(3 WD) methodology and Label Uniform Distribution(LUD) model, a Cost-sensitive Active learning through the Farthest distance sum Sampling(CAFS) algorithm was proposed. Firstly, the farthest total distance sampling strategy was designed to query the labels of representative samples. Secondly, LUD model and cost function were used to calculate the expected sampling number. Finally, k-Means algorithm was employed to split blocks obtained different labels. In CAFS, 3 WD methodology was adopted in the iterative process of label query, instance prediction, and block splitting, until all instances were processed. The learning process was controlled by the cost minimization objective. Results on 9 public datasets show that CAFS has lower average cost compared with 11 mainstream algorithms.
作者
任杰
闵帆
汪敏
REN Jie;MIN Fan;WANG Min(School of Computer Science,Southwest Petroleum University,Chengdu Sichuan 610500,China;School of Electrical Engineering and Information,Southwest Petroleum University,Chengdu Sichuan 610500,China)
出处
《计算机应用》
CSCD
北大核心
2019年第9期2499-2504,共6页
journal of Computer Applications
基金
四川省青年科技创新团队专项(2019JDTD0017)
四川省应用基础研究项目(2019JDTD0017)~~
关键词
主动学习
K-MEANS聚类
标签均匀分布
三支决策
active learning
k-Means clustering
label uniform distribution
Three-Way Decision(3WD)