期刊文献+

基于混合比例估计的标签噪声学习方法

Label-noise learning via mixture proportion estimation
原文传递
导出
摘要 近年来,人工智能蓬勃发展,伴随着计算硬件算力的提升,深度学习已成为了人工智能算法的新范式.然而深度学习依赖大量精确标注的数据,在现实的多类别分类场景中,受限于标注成本和隐私数据保护等因素,大量精准标注的数据往往难以获得.近些年,移动众包和网络爬虫这类经济廉价的数据收集方法被广泛采用,但他们不可避免地引入了错误标注,即标签噪声.鉴于深度神经网络强大的数据拟合能力,标签噪声的存在将造成算法的过拟合,严重制约了深度学习方法的泛化能力.针对标签噪声问题,现有研究大多显式或隐式地依赖锚点(明确属于某一类别的样本),然而在现实场景中锚点难以获取,这使得现有解决方案不再适用.为解决这一问题,本文创造性地将多类别标签噪声学习问题转化为混合比例估计(mixture proportion estimation,MPE)问题,构建了一种不依赖锚点的满足统计一致性的学习算法.本文的主要贡献包括:(1)对现有的仅适用于二组成物MPE场景的R-MPE(regrouping-MPE)方法进行推广,提出了多组成物场景下不依赖不可约假设的MPE方法MR-MPE(multi-component oriented R-MPE);(2)理论上证明了多类别分类场景下标签噪声学习算法锚点假设和MPE问题不可约假设的等价性,并基于所提出的MR-MPE方法构建了不依赖锚点的满足统计一致性的算法.本文在合成噪声数据集和真实噪声数据集上分别与现有算法进行了对比实验,结果显示本文所提算法在多个数据集上均展现出了最优的性能;同时,在移除锚点的情况下,本文对算法的鲁棒性进行了测试,验证了所提算法不依赖锚点的特性. With the rise of artificial intelligence in recent years,along with the improvement of hardware computing power,deep learning has emerged as the new paradigm for artificial intelligence algorithms.In realistic multi-class classification scenarios,deep learning relies heavily on the availability of massive manually labeled data;the limitations of labeling costs and privacy protections,however,often make it difficult to obtain adequate amounts of appropriately labeled data for deep learning.Recently,crowdsourcing and web crawling have provided an easy way to collect large amounts of labeled data,but they are limited by the inevitable introduction of label noise.As deep neural networks have a high capacity to fit noisy labels,it is challenging to train deep networks robustly with noisy labels.For robust learning,existing works commonly rely explicitly or implicitly on a given set of anchor points,i.e.,instances that almost certainly belong to the true classes.Unfortunately,anchor points are difficult to obtain in practice,which makes these works fragile.To address this problem,in this paper,we build an anchor-free statistically consistent algorithm in the presence of label noise by creatively transforming the multi-class label-noise learning problem into a mixture proportion estimation(MPE)problem.This paper makes the following contributions:(i)we for the first time generalize the existing Regrouping-MPE(R-MPE)method that is only suitable for two-component scenarios,and propose a multi-component oriented R-MPE(MRMPE)method without relying on the common irreducible assumption;and(ii)from a theoretical perspective,we demonstrate that the anchor point hypothesis for label-noise learning is equivalent to the irreducible hypothesis for MPE problems in the context of multi-class classification.Therefore,an anchor-free statistically consistent label-noise learning algorithm is subsequently constructed based on the proposed MR-MPE method.In this paper,comparative experiments with existing algorithms are conducted on both sy
作者 郑庆华 曹书植 阮建飞 赵锐 董博 Qinghua ZHENG;Shuzhi CAO;Jianfei RUAN;Rui ZHAO;Bo DONG(School of Computer Science and Technology,Xi'an Jiaotong University,Xi'an 710049,China;School of Continuing Education,Xi'an Jiaotong University,Xi'an 710049,China;Ministry of Education Key Lab for Intelligent Networks and Network Security,Xi'an 710049,China;Shaanxi Province Key Lab of Satellite and Terrestrial Network Technology Research and Development,Xi'an 710049,China)
出处 《中国科学:信息科学》 CSCD 北大核心 2024年第3期603-622,共20页 Scientia Sinica(Informationis)
基金 科技创新2030—“新一代人工智能”重大项目(批准号:2020AAA0108800) 国家自然科学基金(批准号:62037001,61721002,62002282) 教育部创新团队项目(批准号:IRT−17R86) 西安交通大学本科教学改革研究项目(批准号:20JX04Y) 西安交大-税友集团税务大数据协同创新项目资助。
关键词 混合比例估计 多类别分类 标签噪声学习 锚点 不可约假设 统计一致性 mixture proportion estimation multi-class classification label-noise learning anchor point irreducible assumption statistical consistency
  • 相关文献

参考文献2

二级参考文献4

共引文献121

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部