协同训练半监督学习二次伪迭代算法

Fake-iterative Algorithm for Co-training Semi-supervised Learning

下载PDF

导出

摘要在半监督学习训练的过程中,由于分类器对噪声的引入使得分类器性能下降而影响分类准确性,本文提出一种具有自我调节的二次伪迭代算法。该算法延用Tri-training算法的3个分类器思想,在一定条件下引入少量的人工作业,从而避免一些标记难分类而影响训练的进行,并且采用自我调节功能,用于减少在分类过程中出现的噪声数据和降低对分类器性能提高无贡献数据的加入,同时运用二次伪迭代训练过程用于提高未标记样本的利用率和贡献值。通过实验和结果数据验证,该算法能有效改良分类器的性能和提高未标记样本的利用率及贡献值,分类的准确性得到一定提高。 In the semi-supervised learning process,the veracity of classification is affected because the classifier introduces the noise data to the training course.This paper proposes a kind of self-regulation and twice fake-iterative algorithm,which still uses the three classifier of tri-training algorithm.A small amount of manual work will be introduced under certain conditions to make the training process going on,thus,to avoid the difficulty in the classification of some labels.The self-regulatory function is also used to reduce the noise data and noncontributory data to be added in the classification process.Mean while,the utilization and contribution of unlabeled samples is improved by using twice fake-iterative.The experiment and the results show that this algorithm can effectively improve the classification performance,and the utilization and contribution of unlabeled samples.The veracity of classification is improved obviously.

作者黄霜明谢丽聪

机构地区福州大学数学与计算机科学学院

出处《广西师范大学学报（自然科学版）》 CAS 北大核心 2011年第3期110-114,共5页 Journal of Guangxi Normal University:Natural Science Edition

基金中科院软件所开放课题基金资助项目(SYSKF0701) 国家自然科学基金资助项目(61070062)

关键词协同训练二次伪迭代自我调节机制贡献值人工作业 co-training twice fake-iterative self-regulation contribution manual work

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献11

1梁吉业,高嘉伟,常瑜.半监督学习研究进展[J].山西大学学报（自然科学版）,2009,32(4):528-534. 被引量：32
2傅彦,周俊临.基于无监督学习的盲信号源分离技术研究[J].电子科技大学学报,2004,33(1):63-66. 被引量：8
3周志华.半监督学习中的协同训练风范[M]//机器学习及其应用.北京:清华大学出版社,2007:259-275. 被引量：12
4杜明,周而重.机器学习在模式识别中的应用研究[J].科技信息,2009(9):37-38. 被引量：6
5NIGAM K,McCALLUM A K,THRUN S,et al. Text classification from labeled and unlabeled documents using EM [J]. Machine Learning ,2000,39(2/3): 103-134. 被引量：1
6JOACHIMS T. Transductive inference for text classification using support vector machines [C]//Proceedings of the 16th International Conference on Machine Learning. New York:Morgan Kaufmann, 1999:200-209. 被引量：1
7BLUM A,LAFFERTY J,RWEBANGIRA M R,et al. Semi-supervised learning using randomized mincuts[C]//Proceedings of the 21st International Conference on Machine Learining. New York :ACM, 2004: 97-104. 被引量：1
8BLUM A ,MITCHELL T. Combining labeled and unlabeled data co-training[C]//Proceedings of the 11th annual conference on Computational Learning Theory. New York : ACM, 1998 : 92-100. 被引量：1
9GOLDMAN S,ZHOU Yan. Enhancing supervised learning with unlabeled data[C]//Proceedings of the 17th International Conference on Machine Learning. New York:Morgan Kaufmann, 2000:327-334. 被引量：1
10ZHOU Zhi-hua,Ll Ming. Tri-training:exploiting unlabeled data using three classtifiers[J]. IEEE Transactions on Knowledge and Data Engineering, 2005,17 (11): 1529-1541. 被引量：1

二级参考文献30

1赵莹,高隽,汪荣贵,胡静.一种新的广义最近邻方法研究[J].电子学报,2004,32(F12):196-198. 被引量：13
2苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量：386
3李和平,胡占义,吴毅红,吴福朝.基于半监督学习的行为建模与异常检测[J].软件学报,2007,18(3):527-537. 被引量：30
4郑海清,林琛,牛军钰.一种基于紧密度的半监督文本分类方法[J].中文信息学报,2007,21(3):54-60. 被引量：11
5Zhu Xiaojin. Semi-Supervised Learning Literature Survey. Technical Report, 1530, Madison, USA : University of Wisconsin at Madison. Department of Computer Sciences, 2006. 被引量：1
6Blum A, Mitchell T. Combining Labeled and Unlabeled Data with Co-Training//Proc of the 11 th Annual Conference on Computational Learning Theory. Madison, USA, 1998 : 92 - 100. 被引量：1
7Goldman S A, Zhou Yan. Enhancing Supervised Learning with Unlabeled Data// Proc of the 17th International Conference on Machine Learning. Stanford, USA, 2000 : 327 - 334. 被引量：1
8Zhou Zhihua, Li Ming. Tri-Training: Exploiting Unlabeled Data Using Three Classifiers. IEEE Trans on Knowledge and Data Engineering, 2005, 17(11) : 1529 -1541. 被引量：1
9Lewis D D, Gale W A. A Sequential Algorithm for Training Text Classifier//Proc of the 17th International Conference on Research and Development in Information Retrieval. Dublin, Ireland, 1994: 3 -12. 被引量：1
10Kothari R, Jain V. Learning from Labeled and Unlabeled Data Using a Minimal Number of Queries. IEEE Trans on Neural Networks, 2003, 14(6) : 1496 - 1505. 被引量：1

共引文献59

1麻瓯勃,刘雪娇,唐旭栋,周宇轩,胡亦承.基于半监督学习的恶意URL检测方法[J].计算机系统应用,2020(11):11-20. 被引量：4
2熊英.基于改进的H-J算法的信号盲分离研究[J].实验科学与技术,2007,5(3):10-11.
3付卫红,杨小牛.改进的基于步长自适应的自然梯度盲源分离算法[J].华中科技大学学报（自然科学版）,2007,35(10):18-20. 被引量：6
4付卫红,杨小牛,刘乃安,曾兴雯.基于步长最优化的EASI盲源分离算法[J].四川大学学报（工程科学版）,2008,40(1):118-121. 被引量：9
5付卫红,杨小牛,刘乃安.基于四阶累积量的稳健的通信信号盲分离算法[J].电子与信息学报,2008,30(8):1853-1856. 被引量：20
6黎铭,周志华.基于多核集成的在线半监督学习方法[J].计算机研究与发展,2008,45(12):2060-2068. 被引量：12
7姜远,黎铭,周志华.一种基于半监督学习的多模态Web查询精化方法[J].计算机学报,2009,32(10):2099-2106. 被引量：2
8李广水,宋丁全,郑滔,李杨,苏继申.协同训练支持向量机对遥感影像的分类研究[J].计算机工程与应用,2009,45(29):160-163. 被引量：3
9夏士雄,李佑文,周勇.一种半监督局部线性嵌入算法的文本分类方法[J].计算机应用研究,2010,27(1):64-67. 被引量：9
10付卫红,刘乃安,杨小牛,曾兴雯.基于相对梯度的鲁棒的盲源分离算法[J].系统工程与电子技术,2010,32(2):226-228. 被引量：4

1吴静敏,左洪福,陈勇.基于免疫粒子群算法的组合预测方法[J].系统工程理论方法应用,2006,15(3):229-233. 被引量：23
2吴道永.基于人工免疫和神经网络混成算法的入侵检测系统研究[J].南平师专学报,2007,26(4):55-59.
3蒋尚亭,王谦,张宏伟.云计算环境下基于免疫蚁群算法的任务调度研究[J].山东轻工业学院学报（自然科学版）,2013,27(2):56-59. 被引量：2
4王雷,杨思春.基于改进Tri-training算法的中文问句分类[J].安徽工业大学学报（自然科学版）,2016,33(2):172-176. 被引量：1
5张雁,林英,吕丹桔.基于Tri-Training算法的数据编辑技术[J].计算机与数字工程,2013,41(10):1583-1585.
6张雁,吕丹桔,吴保国.基于Tri-Training半监督分类算法的研究[J].计算机技术与发展,2013,23(7):77-79. 被引量：9
7张雁,吴保国,吕丹桔,林英.基于Tri-training的主动学习算法[J].计算机工程,2014,40(6):215-218. 被引量：3
8林菁,江琳.免疫粒子群算法下向量机参数选择及金融应用[J].福建金融管理干部学院学报,2012(3):60-64.
9周颖,郑德玲,裘之亮,刘聪.一种人工免疫与RBF神经网络结合的混合算法的应用[J].计算机工程与应用,2004,40(1):39-40. 被引量：10
10李心磊,杨思春,彭月娥.Tri-training算法中分类器组合的改进[J].苏州科技学院学报（自然科学版）,2014,31(2):52-56. 被引量：4

广西师范大学学报（自然科学版）

2011年第3期

浏览历史

内容加载中请稍等...

协同训练半监督学习二次伪迭代算法

参考文献11

二级参考文献30

共引文献59

相关作者

相关机构

相关主题

浏览历史