期刊文献+

利用置信度重取样的SemiBoost-CR分类模型 被引量:5

Advanced SemiBoost-CR Categorization Model Utilizing Confidence-Based Resampling
下载PDF
导出
摘要 结合半监督学习和集成学习方法,提出了一种基于置信度重取样的SemiBoost-CR分类模型。给出了基于标注近邻与未标注近邻的置信度计算公式,按照置信度重采样,不仅选取一定比例置信度较高的未标注样本,而且选取一定比例置信度较低的未标注样本,分别以不同的策略加入到已标注的训练样本集。引入置信度高的未标注样本,用以提高基分类器的正确性(accuracy);而引入置信度低的未标注样本,目的则是进一步增加基分类器间的差异性(diversity)。对比实验表明,SemiBoost-CR分类模型能够有效提升Naive Bayesian文本分类器的性能。 This paper proposes SemiBoost-CR, an enhanced categorization model which utilizing the confidence- based resampling technique and incorporating semi-supervised learning with ensemble learning. The confidence score is derived from the nearer labeled neighbors and unlabeled neighbors of the example. According to the confidence-based resampling, not only the unlabeled examples with higher confidence score, but also the unlabeled ones with lower confidence score are selected and added to the labeled training set. The accuracy of the base classi- fier is to be improved by introducing the unlabeled data with higher confidence; the diversity among the base classifiers is further increased by introducing the unlabeled data with lower confidence. Experimental results show that SemiBoost-CR can boost the performance of Naive Bayesian text categorization.
出处 《计算机科学与探索》 CSCD 2011年第11期1048-1056,共9页 Journal of Frontiers of Computer Science and Technology
基金 国家自然科学基金No.61073133 61175053 高等学校博士学科点专项科研基金No.20070151009~~
关键词 BOOSTING 半监督分类 朴素贝叶斯 置信度 重取样 boosting semi-supervised categorization Naive Bayesian confidence resampling
  • 相关文献

参考文献2

二级参考文献26

  • 1唐焕玲,孙建涛,陆玉昌.文本分类中结合评估函数的TEF-WA权值调整技术[J].计算机研究与发展,2005,42(1):47-53. 被引量:26
  • 2Seeger M. Learning with labeled and unlabeled data [R]. Edinburgh, UK : Edinburgh University, 2001 被引量:1
  • 3Blum A, Mitchell T. Combining labeled and unlabeled data with co-training [C] //Proc of the Workshop Computational Learning Theory. New York: ACM, 1998: 92-100 被引量:1
  • 4Nigam K, Ghani R. Analyzing the effectiveness and applicability of co-training [C] //Proc of Int Conf on Information and Knowledge Management. New York:ACM, 2000: 86-93 被引量:1
  • 5Zhou Y, Goldman S. Democratic co learning [C]//Proc of the 16th IEEE Int Conf on Tools with Artificial Intelligence. Washington: IEEE Computer Society, 2004:594-602 被引量:1
  • 6Zhou Z-H, Li M. Tri training: Exploiting unlabeled data using three classifiers [J]. IEEE Trans on Knowledge and Data Engineering, 2005, 17(11) : 1529-1541 被引量:1
  • 7Bickel S, Scheffer T. Estimation of mixture models using CoEM [C] //Proc of the 16th European Conf on Machine Learning. Berlin: Springer, 2005:35-46 被引量:1
  • 8Muslea I, Minton S, Knoblock C. Active + semi-supervised learning=robust multi view learning [C]//Proc of the 19th Int Conf on Machine Learning. San Francisco: Morgan Kaufmann, 2002:435-442 被引量:1
  • 9Cozman F, Cohen I, Cirelo M. Semi supervised learning of mixture models [C]//Proc of the 20th Int Conf on Machine Learning. Menlo Park, CA: AAAI Press, 2003:99-106 被引量:1
  • 10Balcan M F, Blum A. A PAC-style model for learning from labeled and unlabeled data [C] //Proc of the 18th Annual Conf on Learning Theory. Berlin: Springer, 2005:111-126 被引量:1

共引文献17

同被引文献36

引证文献5

二级引证文献47

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部