基于样本条件价值改进的Co-training算法被引量：4

Conditional Value-based Co-training

下载PDF

导出

摘要 Co-training是一种主流的半监督学习算法.该算法中两视图下的分类器通过迭代的方式,互为对方从无标记样本集中挑选新增样本,以更新对方训练集.Co-training以分类器的后验概率输出作为新增样本的挑选策略,该策略忽略了样本对于当前分类器的价值.针对该问题,本文提出一种改进的Co-training式算法—CVCOT(Conditional value-based co-training),即采用基于样本条件价值的挑选策略来优化Co-training.通过定义无标记样本的条件价值,各视图下的分类器以样本条件价值为依据来挑选新增样本,以此更新训练集.该策略既可保证新增样本的标记可靠性,又能优先将价值较高的富信息样本补充到训练集中,可以有效地优化分类器.在UCI数据集和网页分类应用上的实验结果表明:CVCOT具有较好的分类性能和学习效率. Co-training is one of the major semi-supervised learning methods, which iteratively trains two classifiers under two different views, and uses the predictions of either classifier on the unlabeled examples to augment the training set of the other. In each round of co-training, newly added examples are selected according to the classifier＇s posteriori probability output, which neglects examples~ value with respect to the current classifier. This paper proposes an improved co-training style algorithm, termed as CVCOT （conditional value-based co-training）, which employs a conditional value- based strategy for selecting candidate training examples. Specifically, the conditional value of unlabeled examples in the co-training process is defined and computed, then it is utilized by either classifier under different views for augmenting the training set of the other. The new strategy can not only guarantee the reliability of the pseudo-labels, but also tends to add more informative examples with higher values to the training sets. Therefore, the classifier under either view will get refined. Experiments on UCI data sets and application to the web page classification task indicate that the CVCOT achieves better classification performance and learning efficiency.

作者程圣军刘家锋黄庆成唐降龙

机构地区哈尔滨工业大学计算机科学与技术学院

出处《自动化学报》 EI CSCD 北大核心 2013年第10期1665-1673,共9页 Acta Automatica Sinica

基金国家自然科学基金(61173087 61073128) 黑龙江省自然科学基金(F201021)资助~~

关键词机器学习半监督学习 CO-TRAINING 富信息样本条件价值 Machine learning, semi-supervised learning, co-training, informative example, conditional value

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献30

1Chapelle O,Scholkopf B,Zien A.Semi-Supervised Learning.Cambridge,MA:MIT Press,2006. 被引量：1
2Blum A,Mitchell T.Combining labeled and unlabeled data with co-training.In:Proceedings of the 11th Annual Conference on Computational Learning Theory.Wisconsin,MI:ACM,1998.92-100. 被引量：1
3Zhu X J.Semi-supervised Learning Literature Survey,Computer Science Technical Report 1530.University of Wisconsin Madison,USA,2008. 被引量：1
4Pierce D,Cardie C.Limitations of co-training for natural language learning from large datasets.In:Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing.Pittsburgh,PA,2001.1-9. 被引量：1
5Steedman M,Osborne M,Sarkar A,Clark S,Hwa R,Hockenmaier J,Ruhlen P,Baker S,Crim J.Bootstrapping statistical parsers from small datasets.In:Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics.Budapest,Hungary:Association for Computational Linguistics Stroudsburg,2003.331-338. 被引量：1
6Li M,Li H,Zhou Z H.Semi-supervised document retrieval.Information Processing and Management,2009,45(3):341-355. 被引量：1
7Li M,Zhou Z H.Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples.IEEE Transactions on Systems,Man,and Cybernetics-Part A:Systems and Humans,2007,37(6):1088-1098. 被引量：1
8Mavroeidis D,Chaidos K,Pirillos S,Vazirgiannis M.Using tri-training and support vector machines for addressing the ECML-PKDD 2006 discovery challenge.In:Proceedings of the 2006 ECML-PKDD Discovery Challenge Workshop.Berlin,Germany,2006.39-47. 被引量：1
9Settles B.Active Learning Literature Survey,Computer Science Technical Report 1648,University of Wisconsin-Madison,USA,2009. 被引量：1
10Singh A,Nowak R D,Zhu X J.Unlabeled data:now it helps,now it doesn't.Advances in Neural Information Processing Systems.Cambridge:MIT Press,2008.1513-1520. 被引量：1

二级参考文献36

1Settles B. Active Learning Literature Survey, Computer Science Technical Report 1648, University of Wisconsin- Madison, USA, 2009. 3-4. 被引量：1
2Dasgupta S. Coarse sample complexity bounds for active learning. Advances in Neural Information Processing Sys- tems. Cambridge: The MIT Press, 2006. 235-242. 被引量：1
3Tong S, Chang E. Support vector machine active learning for image retrieval. In: Proceedings of the 9th ACM Inter- national Conference on Multimedia. New York, USA: ACM, 2001. 107-118. 被引量：1
4Tong S, Koller D. Support vector machine active learning with applications to text classification. The Journal of Ma- chine Learning Research, 2002, 2:45-66. 被引量：1
5Seung H S, Opper M, Sompolinsky H. Query by commit- tee. In: Proceedings of the 5th Annual Workshop on Com- putational Learning Theory. New York, USA: ACM, 1992. 287-294. 被引量：1
6Dagan I, Engelson S P. Committee-based sampling for train- ing probabilistic classifiers. In: Proceedings of the 12th International Conference on Machine Learning. California, USA: Morgan Kaufmann, 1995. 150-157. 被引量：1
7Hoi S C H, Jin R, Lyu M R. Batch mode active learning with applications to text categorization and image retrieval. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(9): 1233-1248. 被引量：1
8Joshi A J, Porikli F, Papanikolopoulos N. Multi-class ac- tive learning for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recog- nition. Miami, USA: IEEE, 2009. 2372-2379. 被引量：1
9Zhu X J. Semi-supervised Learning Literature Survey, Computer Sciences Technical Report 1530, University of Wisconsin-Madison. USA. 2008. 11-13. 被引量：1
10Riloff E, Wiebe J, Wilson T. Learning subjective nouns using extraction pattern bootstrapping. In: Proceedings of the 7th Conference on Natural Language Learning. Stroudsburg, USA: Association for Computational Linguis- tics, 2003. 25-32. 被引量：1

共引文献87

1刘振宇,李钦富,杨硕,邓应强,刘芬,赖新明,白雪珂.一种基于主动学习和多种监督学习的情感分析模型[J].中国电子科学研究院学报,2020,15(2):171-176. 被引量：2
2曹健,陈红倩,毛典辉,李海生,蔡强.基于局部特征的图像目标识别问题综述[J].中南大学学报（自然科学版）,2013,44(S2):258-262. 被引量：14
3王娇,罗四维,曾宪华.基于随机子空间的半监督协同训练算法[J].电子学报,2008,36(B12):60-65. 被引量：14
4李昆仑,张伟,代运娜.基于Tri-training的半监督SVM[J].计算机工程与应用,2009,45(22):103-106. 被引量：15
5蔡晰,郭躬德,黄添强.用于化合物毒性预测的半监督分类算法[J].计算机工程与设计,2010,31(12):2838-2841.
6孔志周,蔡自兴.基于统计证据的半监督多分类器融合方法[J].控制与决策,2011,26(11):1616-1620. 被引量：2
7邬书跃,余杰,樊晓平.基于Tri-training的入侵检测算法[J].计算机工程,2012,38(6):158-160. 被引量：2
8赵丰丰.美国《化学文摘》近年的变化[J].医学情报工作,2000,21(2):45-47. 被引量：3
9黄扬帆,张慧敏,徐子航,曹鹏程.超球体支持向量机的不完全二叉树多类分类算法[J].重庆大学学报（自然科学版）,2012,35(6):125-128. 被引量：5
10张建朋,陈福才.基于仿射聚类的主动SVM多类分类方法[J].计算机应用研究,2012,29(9):3316-3319.

同被引文献29

1赵悦,穆志纯.基于QBC的主动学习研究及其应用[J].计算机工程,2006,32(24):23-25. 被引量：5
2Balcan N, Berlind C, Ehrlich S, et al. Efficient semi-supervised and active learning of disjunctions[C] //Proc of the 30th International Conference on Machine Learning. 2013:633-641. 被引量：1
3Leng Yan, Xu Xinyan, Qi Guanghui. Combining active learning and semi-supervised learning to construct SVM classier[J] . Knowledge-Based Systems, 2013, 44:121-131. 被引量：1
4Minton M S, Knoblock C A. Active learning with multiple views[J] . Journal of Articial Intelligence Research, 2006, 27(1):203-233. 被引量：1
5Melville P, Mooney R J. Diverse ensembles for active learning[C] //Proc of the 21st International Conference on Machine Learning. New York:ACM Press, 2004:584-591. 被引量：1
6ZHOU Zhi-hua, LI Ming. Tri-Training:exploiting unlabeled data using three classifiers[J] . IEEE Trans on Knowledge and Data Engineering, 2005, 17(11):1529-1542. 被引量：1
7陈荣,曹永锋,孙洪.基于主动学习和半监督学习的多类图像分类[J].自动化学报,2011,37(8):954-962. 被引量：74
8白龙飞,王文剑,郭虎升.一种新的支持向量机主动学习策略[J].南京大学学报（自然科学版）,2012,48(2):182-189. 被引量：10
9吴伟宁,刘扬,郭茂祖,刘晓燕.基于采样策略的主动学习算法研究进展[J].计算机研究与发展,2012,49(6):1162-1173. 被引量：33
10朱岩,景丽萍,于剑.一种利用近邻和信息熵的主动文本标注方法[J].计算机研究与发展,2012,49(6):1306-1312. 被引量：4

引证文献4

1赵建华,刘宁.结合主动学习策略的半监督分类算法[J].计算机应用研究,2015,32(8):2295-2298. 被引量：7
2龚旭,吕佳,皮家甜.结合信息增益率和K-means聚类的协同训练算法[J].重庆师范大学学报（自然科学版）,2020,37(2):112-119. 被引量：4
3马骏,杨镜宇,吴曦.基于预聚类主动半监督的作战体系效能评估[J].系统工程与电子技术,2022,44(6):1889-1896.
4王宇飞,陈文.基于DECORATE集成学习与置信度评估的Tri-training算法[J].计算机科学,2022,49(6):127-133. 被引量：1

二级引证文献12

1张鹏,刘寅,栾国强,刘行,丁晓玉,程根.基于图约束和预聚类的主动学习算法在威胁情景感知中的研究[J].计算机应用研究,2017,34(5):1544-1547. 被引量：1
2王军,刘三民,刘涛.面向概念漂移的数据流分类研究分析[J].绵阳师范学院学报,2017,36(5):80-89.
3贾伟,华庆一,张敏军,陈锐,姬翔,王博.改进极限学习机的移动界面模式半监督分类[J].计算机工程与应用,2018,54(2):11-19. 被引量：7
4陈娟,朱福喜.结合半监督与主动学习的时间序列PU问题分类[J].计算机工程与应用,2018,54(11):116-121.
5张敏,陈锻生.结合情感词典的主动贝叶斯文本情感分类方法[J].华侨大学学报（自然科学版）,2018,39(4):623-626.
6崔颖,徐凯,陆忠军,刘述彬,王立国.主动学习策略融合算法在高光谱图像分类中的应用[J].通信学报,2018,39(4):91-99. 被引量：7
7马骏,杨镜宇,吴曦.基于预聚类主动半监督的作战体系效能评估[J].系统工程与电子技术,2022,44(6):1889-1896.
8徐善亮,吕佳.基尼指数结合K均值聚类的协同训练算法[J].重庆师范大学学报（自然科学版）,2022,39(4):134-140.
9李坤,赵俊三,林伊琳,周豹.基于不同斜坡单元划分方法和BP神经网络的泥石流易发性评价[J].测绘通报,2022(8):68-72. 被引量：8
10姚远,庞震.基于改进K-means聚类的医疗信息系统信息安全检测方法[J].信息技术,2023,47(1):154-157. 被引量：2

1罗艳芬,万国金.基于BP神经网络模型的信息处理系统的应用分析[J].计算机与现代化,2004(11):7-8. 被引量：1
2邬书跃,余杰,樊晓平.基于Tri-training的入侵检测算法[J].计算机工程,2012,38(6):158-160. 被引量：2
3朱坚民,王中宇,吕延庆,周福章,宾鸿赞.基于神经网络的测量模型的建立及检验[J].光学精密工程,2000,8(4):389-393. 被引量：6
4柯逍,李绍滋,陈国龙.基于Co-training的图像自动标注[J].厦门大学学报（自然科学版）,2013,52(4):486-492.
5任冷,周维民.针对非平衡多分类问题SVM算法的优化研究与应用[J].电脑知识与技术,2016,12(2Z):218-220. 被引量：4
6徐庆伶,汪西莉.一种基于支持向量机的半监督分类方法[J].计算机技术与发展,2010,20(10):115-117. 被引量：18
7黄武锋.一种基于神经网络的数据挖掘算法[J].电脑编程技巧与维护,2017(3):57-57.
8梅蓉.基于彩色图像的目标识别方法研究[J].计算机与数字工程,2007,35(9):119-122. 被引量：1
9杨正友,彭涛.基于振动信号分析和支持向量机的滚动轴承故障诊断[J].湖南工业大学学报,2009,23(1):96-99. 被引量：16
10马世欢,张今.一种基于遗传算法的优化分类器方法[J].襄樊职业技术学院学报,2008,7(6):13-14.

自动化学报

2013年第10期

浏览历史

内容加载中请稍等...

基于样本条件价值改进的Co-training算法被引量：4

参考文献30

二级参考文献36

共引文献87

同被引文献29

引证文献4

二级引证文献12

相关作者

相关机构

相关主题

浏览历史

基于样本条件价值改进的Co-training算法 被引量：4

参考文献30

二级参考文献36

共引文献87

同被引文献29

引证文献4

二级引证文献12

相关作者

相关机构

相关主题

浏览历史

基于样本条件价值改进的Co-training算法被引量：4