摘要
研究了在正例和未标记样本场景下不确定样本的分类问题,提出了一种新的算法NNPU(nearest neighbor algorithm for positive and unlabeled learning)。NNPU具有两种实现方式:NNPUa和NNPUu。在UCI标准数据集上的实验结果表明,充分考虑数据不确定信息的NNPUu算法要比仅仅考虑样本中不确定信息均值的NNPUa算法具有更好的分类能力;同时,NNPU算法在对精确数据进行分类时,比NN-d、OCC以及aPUNB算法性能更优。
This paper studies the problem of uncertain data classification under positive and unlabeled (PU) learning scenario. It proposes a novel algorithm, NNPU (nearest neighbor algorithm for positive and unlabeled learning), to handle this problem with two varieties, NNPUa and NNPUu. Experimental results on benchmark UCI datasets show that NNPUu, which considers the whole uncertain information on the datasets, has a better ability to classify unseen examples than NNPUa that considers the average value of uncertainty only. Furthermore, NNPU outperforms some existing algorithms such as NN-d, OCC (one-class classifier) and aPUNB in handling precise data.
出处
《计算机科学与探索》
CSCD
2010年第9期769-779,共11页
Journal of Frontiers of Computer Science and Technology
基金
The National Natural Science Foundation of China under Grant No.60873196
the Fundamental Research Funds for the Central Universities under Grant No.QN2009092~~
关键词
不确定数据
正例和未标记样本学习
最近邻算法
uncertain data
positive and unlabeled learning
nearest neighbor algorithm