针对不确定正例和未标记学习的最近邻算法(英文) 被引量：2

Nearest Neighbor Algorithm for Positive and Unlabeled Learning with Uncertainty

下载PDF

导出

摘要研究了在正例和未标记样本场景下不确定样本的分类问题,提出了一种新的算法NNPU(nearest neighbor algorithm for positive and unlabeled learning)。NNPU具有两种实现方式:NNPUa和NNPUu。在UCI标准数据集上的实验结果表明,充分考虑数据不确定信息的NNPUu算法要比仅仅考虑样本中不确定信息均值的NNPUa算法具有更好的分类能力;同时,NNPU算法在对精确数据进行分类时,比NN-d、OCC以及aPUNB算法性能更优。 This paper studies the problem of uncertain data classification under positive and unlabeled （PU） learning scenario. It proposes a novel algorithm, NNPU （nearest neighbor algorithm for positive and unlabeled learning）, to handle this problem with two varieties, NNPUa and NNPUu. Experimental results on benchmark UCI datasets show that NNPUu, which considers the whole uncertain information on the datasets, has a better ability to classify unseen examples than NNPUa that considers the average value of uncertainty only. Furthermore, NNPU outperforms some existing algorithms such as NN-d, OCC （one-class classifier） and aPUNB in handling precise data.

作者潘世瑞张阳李雪王勇

机构地区西北农林科技大学信息工程学院南京大学计算机软件新技术国家重点实验室昆士兰大学计算机及电子工程系西北工业大学计算机学院

出处《计算机科学与探索》 CSCD 2010年第9期769-779,共11页 Journal of Frontiers of Computer Science and Technology

基金 The National Natural Science Foundation of China under Grant No.60873196 the Fundamental Research Funds for the Central Universities under Grant No.QN2009092~~

关键词不确定数据正例和未标记样本学习最近邻算法 uncertain data positive and unlabeled learning nearest neighbor algorithm

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献21

1Ren J, Lee S D, Chen X, et al. Naive Bayes classification of uncertain data[C]//Proceedings of IEEE International Conference on Data Mining, 2009. 被引量：1
2Tsang S, Kao B, Yip K Y, et al. Decision trees for uncertain data[C]//Proceedings of IEEE International Conference on Data Engineering, 2009: 441--444. 被引量：1
3Liu B, Dai Y, Li X, et al. Building text classifiers using positive and unlabeled examples[C]//Proceedings of IEEE International Conference on Data Mining, 2003: 179-186. 被引量：1
4Fung G P C, Yu J X, Lu H, et al. Text classification without negative examples revisits[J]. 1EEE Transactions on Knowledge and Data Engineering, 2006, 18 (1): 6-20. 被引量：1
5Calvo B, Larranaga P, Lozano J. Learning Bayesian classifiers from positive and unlabeled examples[J]. Pattern Recognition Letters, 2007, 28(16): 2375-2384. 被引量：1
6Tax D, Duin R. Data description in subspaces[C]//Proceedings of International Conference on Pattern Recognition, 2000: 672-675. 被引量：1
7Tax D M J, Duin R P W. Support vector domain description[J]. Pattern Recognition Letters, 1999,20:1191-1199. 被引量：1
8Scholkopf B, Platt J, Shawe-Taylor J, et al. Estimating the support of a high-dimensional distribution[J]. Neural Computation, 2001, 13(7): 1443-1471. 被引量：1
9Hempstalk K, Frank E, Witten I. One-class classification by combining density and class probability estimation[C]// Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2008: 505-519. 被引量：1
10He J, Zhang Y, Li X, et al. Naive Bayes classifier for positive and unlabeled learning with uncertainty[C]// Proceedings of SIAM International Conference on Data Mining, 2010. 被引量：1

同被引文献17

1韩慧,毛锋,王文渊.数据挖掘中决策树算法的最新进展[J].计算机应用研究,2004,21(12):5-8. 被引量：47
2Dietterich T G.Ensemble methods in machine learning[M] //Multiple Classifier Systems.Berlin:Springer,2000:1-15. 被引量：1
3Banfield R E,Hall L O,Bowyer K W,et al.A comparison of decision tree ensemble creation techniques[J].IEEE Trans on Pattern Analysis and Machine Intelligence,2007,29(1):173-180. 被引量：1
4Webb G I,Boughton J R,Zheng Fei,et al.Learning by extrapo-lation from marginal to full-multivariate probability distributions:decreasingly naive Bayesian classification[J].Machine Learning,2012,86(2):233-272. 被引量：1
5Dinis F,Gilleron R,Letouzey F.Learning from positive and unlabeled examples[J].Theoretical Computer Science,2005,348(1):70-83. 被引量：1
6Yu H,Han J,Chang K.PEBL:positive example based learning for Web page classification using SVM[C] //Proc of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM Press,2002:239-248. 被引量：1
7Liu Bing,Lee W S,Yu P S,et al.Partially supervised classification of text documents[C] //Proc of the 19th International Conference on Machine Learning.2002:387-394. 被引量：1
8Denis F,Laurent A,Gilleron R,et al.Text classification and co-training from positive and unlabeled examples[C] //Proc of ICML Workshop:the Continuum from Labeled to Unlabeled Data.2003:80-87. 被引量：1
9He Jiazhen,Zhang Yang,Li Xue,et al.Bayesian classifiers for positive unlabeled learning[C] //Proc of the 12th International Confe-rence on Web-Age Information Management.2011:81-93. 被引量：1
10Webb G I,Boughton J R,Wang Z H.Not so naive Bayes:aggregating one-dependence estimators[J].Machine Learning,2005,58(1):5-24. 被引量：1

引证文献2

1张金蕾,李梅,张阳,梁春泉,王勇.P-AnDT:平均n依赖决策树的正例未标注学习算法[J].计算机应用研究,2016,33(7):1941-1944. 被引量：2
2杨建林,刘扬.基于关联分类算法的PU学习研究[J].数据分析与知识发现,2017,1(11):12-18. 被引量：1

二级引证文献3

1张春生,图雅,李艳.基于决策树的蒙医方剂药物与主治的关系研究[J].中国中医基础医学杂志,2018,24(9):1299-1302. 被引量：7
2高冰涛,翟振刚,刘斌.PU场景下的生物医学命名实体识别算法研究[J].智能物联技术,2019,51(1):22-28. 被引量：1
3姚亮亮.基于关联规则的图书馆中文文本自动分类方法[J].科技资讯,2020,18(14):171-171.

1吕锋,李延斌.一种改进的拓扑网络分层算法[J].计算机应用与软件,2009,26(2):260-262. 被引量：1
2尹方,邓壮.基于不含负长度环有向图的Dijkstra算法[J].重庆邮电大学学报（自然科学版）,2006,18(z1):218-220.
3跃迁层：幻想向现实跃迁[J].科幻世界,2009(5):4-4.
4李新章.惠普为何无法制定正确战略[J].发现,2013(4):16-18.
5黄永毅,龚垒.基于主动学习的交互式支持向量机文本分类学习方法[J].电子技术与软件工程,2016(14):168-168. 被引量：2
6谢白杨,李广亮,杨志恺.基于地磁车辆检测技术研究[J].杭州电子科技大学学报（自然科学版）,2013,33(4):62-65. 被引量：6
7林玲,廖德,高阳,杨琬琪.基于加权样本选择与主动学习的视频异常行为检测算法[J].模式识别与人工智能,2016,29(4):341-349. 被引量：13
8网络电视视角[J].电视工程,2009(1):4-4. 被引量：1
9王鹏,景丽萍.改进的单类协同过滤推荐方法[J].计算机科学与探索,2014,8(10):1231-1238. 被引量：4
10赵相国,毕鑫,张祯,喻鑫.基于抽样方法的不确定极限学习机[J].东北大学学报（自然科学版）,2015,36(11):1539-1542.

计算机科学与探索

2010年第9期

浏览历史

内容加载中请稍等...

针对不确定正例和未标记学习的最近邻算法(英文) 被引量：2

参考文献21

同被引文献17

引证文献2

二级引证文献3

相关作者

相关机构

相关主题

浏览历史