摘要
在偏标记学习中,示例的真实标记隐藏在由一组候选标记组成的标记集中。现有的偏标记学习算法在衡量示例之间的相似度时,只基于示例的特征进行计算,缺乏对候选标记集信息的利用。该文提出一种候选标记感知的偏标记学习算法(CLAPLL),在构建图的阶段有效地结合候选标记集信息来衡量示例之间的相似度。首先,基于杰卡德距离和线性重构,计算出各个示例的标记集之间的相似度,然后结合示例相似度和标记集的相似度构建相似度图,并通过现有的基于图的偏标记学习算法进行学习和预测。3个合成数据集和6个真实数据集上实验结果表明,该文方法相比于基线算法消歧准确率提升了0.3%~16.5%,分类准确率提升了0.2%~2.8%。
In partial label learning,the true label of an instance is hidden in a label-set consisting of a group of candidate labels.The existing partial label learning algorithm only measures the similarity between instances based on feature vectors and lacks the utilization of the candidate labelset information.In this paper,a Candidate Label-Aware Partial Label Learning(CLAPLL)method is proposed,which combines effectively candidate label information to measure the similarity between instances during the graph construction phase.First,based on the jaccard distance and linear reconstruction,the similarity between the candidate labelsets of instances is calculated.Then,the similarity graph is constructed by combining the similarity of the instances and the label-sets,and then the existing graph-based partial label learning algorithm is presented for learning and prediction.The experimental results on 3 synthetic datasets and 6 real datasets show that disambiguation accuracy of the proposed method is 0.3%~16.5%higher than baseline algorithm,and the classification accuracy is increased by 0.2%~2.8%.
作者
陈鸿昶
谢天
高超
李邵梅
黄瑞阳
CHEN Hongchang;XIE Tian;GAO Chao;LI Shaomei;HUANG Ruiyang(National Digital Switching System Engineering & Technological R&D Center,Zhengzhou 450002,China)
出处
《电子与信息学报》
EI
CSCD
北大核心
2019年第10期2516-2524,共9页
Journal of Electronics & Information Technology
基金
国家自然科学基金(61601513)~~
关键词
偏标记学习
弱监督学习
消歧
杰卡德距离
线性重构
Partial label learning
Weakly supervised learning
Disambiguation
Jaccard distance
Linear reconstruction