摘要
在优化算法的研究中,针对KNN算法对缺失数据的填补效果会因为原始数据中存在噪声而受到严重影响的问题,根据待填补缺失数据最近邻的近邻关系,提出了一种新的缺失数据填补算法——ENN-KNN(Eliminate Neighbor Noise k-Nearest Neighbor)。通过比较待填补缺失数据每个最近邻的真实近邻程度能够有效地识别潜在的噪声最近邻。最后使用所有非噪声最近邻对待填补缺失数据进行填补,从而消除了噪声最近邻对填补结果的影响。通过观察四组UCI数据集的仿真结果,可知ENN-KNN算法的填补准确性总体上要优于KNN算法。
Traditional KNN imputation method for dealing with missing data is severely affected by the noise in the original data. This paper presents a novel imputation method for dealing with missing data, which is based on the relationship of nearest neighbors of missing data ENN-KNN( Eliminate Neighbor Noise k-Nearest Neighbor). ENN -KNN imputation method can effectively identify potential noise nearest neighbor by comparing each real nearest de- gree of nearest neighbor of missing data. It uses all nearest neighbors which are not noise nearest neighbor to deal with missing data, for this reason it can eliminate the effect of noise nearest neighbor for dealing with missing data. The experiment results of four groups of UCI data sets show that the ENN-KNN imputation method is overall superior to KNN imputation method on the performance of prediction accuracy.
出处
《计算机仿真》
CSCD
北大核心
2014年第7期264-268,共5页
Computer Simulation
基金
北京市自然科学基金(7110001)
关键词
缺失数据填补
近邻
噪声最近邻
Missing data imputation
Nearest neighbors
Noise nearest neighbor