摘要
针对化工过程中的数据样本缺失的问题,本文基于改进的K近邻算法对样本数据缺失值补全的方法进行了研究,阐述了K近邻补全算法的基本思路,并针对K近邻补全算法在缺失数据的K个最近邻的选择上可能存在的偏好,提出了一种改进的数据补全算法,有效的解决了K近邻补全算法在近邻选取上的偏向性。根据K近邻补全算法选取的K个近邻数据与缺失数据之间的距离,对K个近邻作加权,使得补全的数据更趋合理。仿真实验证明:改进的K近邻补全算法可以对样本的缺失部分进行更加有效的补全,从而扩展了软测量建模可用的样本数量。
In view of the problem of the missing data in chemical process, this article studies a method for missing data completion based on theK-neighbor algorithm, and expounds the basic idea ofK-neighbor completion, proposes an improved method of completion data to solve the problem about nearest neighbor algorithm selection bias effectively. According to the distance between the neighboring data and missing data, the weights ofK-neighbors are distributed to make the completion of data more reasonable. The simulation results show that the improved nearest neighbor completion algorithm can effectively estimate the missing part of samples so that the sample set can be expanded.
出处
《计算机与应用化学》
CAS
2015年第12期1499-1502,共4页
Computers and Applied Chemistry
基金
国家自然科学基金资助项目(61273070)
江苏高校优势学科建设工程资助项目
关键词
K近邻
缺失补全
偏向性
加权
nearest neighbor
missing data completion
bias
weight