摘要
距离度量是影响k-近邻(KNN)法分类精度的重要因素之一。提出一种融合邻域信息的k-近邻算法。首先,定义了样本邻域的概念,并根据邻域的影响提出2条相应准则;然后,在计算测试样本与训练样本的距离时,综合考虑了样本邻域所带来的影响。该算法不仅可以更加精确地刻画样本之间的距离,而且一定程度上增强了KNN的稳定性。该方法在UCI标准数据集上进行了测试,结果表明,性能优于或与其他相关的分类器相当,并且在噪声扰动下具有较强的鲁棒性。
Distance measurement is one of the important factors which affect the classification accuracy of the k nea- rest neighbor (KNN) algorithm. In this paper, an improved k nearest neighbor algorithm fusing neighborhood infor- mation is presented. Firstly, the concept of the instance neighborhood is defined and two criterions are presented according to neighborhood influence; then, the influence of the instance neighborhood is comprehensively consid- ered when the distance between the testing instances and the training instances is computed. This algorithm can characterize the distance among instances more precisely, and enhance the stability of the KNN to some extent. This presented method was tested on the UCI datasets, and the results showed that this proposed technique is better than or equal to other classifiers, and it is more robust under the noise disturbance.
出处
《智能系统学报》
CSCD
北大核心
2014年第2期240-243,共4页
CAAI Transactions on Intelligent Systems
基金
国家自然科学基金资助项目(61303131
61379021)
福建省自然科学基金资助项目(2013J01028
2012D141)
福建省A类科技资助项目(JA12220)
关键词
K-近邻
邻域信息
分类学习
距离测量
噪音干扰
k-nearest neighbor
neighborhood information
classification learning
distance measurement
noise dis- turbance