期刊文献+

基于动态阈值和差异性检验的自训练算法

Self-training algorithm based on dynamic threshold and difference test
下载PDF
导出
摘要 针对自训练算法在迭代训练分类器的过程中存在难以有效选取高置信度样本以及误标记样本错误累积的问题,本文提出了基于动态阈值和差异性检验的自训练算法。引入样本的局部离群因子,据此剔除有标签样本中的离群点以及分类标注无标签样本,依据标注分批次处理无标签样本,以使模型更易选取到高置信度的无标签样本;根据新增伪标签样本的数量和对比隶属度的变化,设计一种动态隶属度阈值函数,提升高置信度样本的质量;定义密集距离度量样本间的差异性,分别计算伪标签样本与同类和不同类样本之间的密集距离之和,从而找出不确定度高的伪标签样本,并将此类样本并入下轮训练的无标签样本集中,缓解误标记样本错误累积的问题。实验结果表明,该算法在12个UCI基准数据集上均取得理想效果。 In the process of iterative training of the classifier by a self-training algorithm,it is difficult to effectively select high-confidence samples and there exists mislabeled samples error accumulation.To address the above issues,this paper proposes a self-training algorithm based on dynamic threshold and difference test.The local outlier factor of the sample is introduced to remove the outliers from the labeled samples,classify and label the unlabeled samples.The unlabeled samples are subsequently fed into the model in batches based on the assigned mark,allowing the model to more easily select high-confidence unlabeled samples.Further,a dynamic membership threshold function is designed based on the changes in the number of newly added pseudo-labeled samples and the contrast membership.This function aims to improve the quality of high-confidence samples.Finally,the dense distance is defined to measure the difference between samples.The sum of dense distances between pseudo-labeled samples and samples of the same class and different classes is calculated separately to find the pseudo-labeled samples with high uncertainty,and incorporate these samples into the unlabeled samples set of the next round of training,which alleviates error accumulation of mislabeled samples.The experimental results demonstrate effectiveness of this algorithm on 12 benchmark UCI datasets.
作者 吕佳 邱鸿波 肖锋 LYU Jia;QIU Hongbo;XIAO Feng(College of Computer and Information Sciences,Chongqing Normal University,Chongqing 401331,China;Chongqing Digital Agriculture Service Engineering Technology Research Center,Chongqing 401331,China)
出处 《智能系统学报》 CSCD 北大核心 2024年第4期839-852,共14页 CAAI Transactions on Intelligent Systems
基金 国家自然科学基金重大项目(11991024) 重庆市教委“成渝地区双城经济圈建设”科技创新项目(KJCX2020024) 重庆市高校创新研究群体资助项目(CXQT20015).
关键词 自训练算法 误标记样本 高置信度样本 动态阈值 差异性检验 局部离群因子 对比隶属度 密集距离 self-training algorithm mislabeled samples high-confidence samples dynamic threshold difference test local outlier factor contrast membership dense distance
  • 相关文献

参考文献8

二级参考文献40

共引文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部