摘要
针对k NN分类器在海量数据集中搜索k近邻计算复杂、耗时长、存储空间大等缺点,提出以单元属性赋值为基础的分类器设计原理和实施方案.分类器将待分类点映射到其所在单元,对待识别单元内的点在其相应窗口内生成k近邻集,并按kNN准则做出类属决策或拒绝决策.对某类样本占明显优势属性单元内的点直接按该类做出类属决策;对具有与给定样本集弱关联以及任一类样本不占优势属性单元内的点和待识型单元内可拒绝决策点给出相应处理办法.同时,对提高分类速度和精度,解决单元分割问题,选定有关参数,估计错分率等进行讨论并提出相应对策.通过仿真实验,与kNN分类器对比分析,进一步证明本文方法的有效性.
Focusing on k nearest neighbor classifier with drawbacks of complex calculations, time consumption and large storage space of, a criterion for unacceptable decision point and unit properties of sample space is described and a k NN classifier based on unit properties assignment is proposed. Firstly, test sample is mapped into its unit by the classifier proposed and calculate its k nearest neighbor set. Secondly, decision result of the test sample is obtained by k NN method. In the unit as proposed, if there are most samples which belong to the same class, the test sample will be set as the same class; else, it will be rejected. Lastly, the method on improving speed, accuracy of k NN classifier and how to select parameter are discussed. By a simulation case in semiconductor batch process, the effectiveness of the method proposed is demonstrated.
出处
《辽宁工程技术大学学报(自然科学版)》
CAS
北大核心
2017年第11期1218-1223,共6页
Journal of Liaoning Technical University (Natural Science)
基金
国家自然科学基金(61673279)
辽宁省教育厅基金(L2015432)
辽宁省自然科学基金(2015020164)
关键词
数据挖掘
KNN分类器
大数据
样本空间分解
模式识别
data mining
k nearest neighbor classifier
big data
sample space decomposition
pattern recognition