摘要
LOF(Local Outlier Factor)算法是常用的离群点检测算法,但是该算法在面对大规模数据集时往往需要高昂的时空开销,基于固定网格的离群点检测算法虽然在一定程度上可以解决该问题,但是它的执行效果易受到网格划分粒度的影响。对此提出一种基于可变网格划分的离群点检测算法。该算法首先根据数据点在空间的实际分布情况来动态构建与原始数据集分布大体一致的网格空间,然后删除网格中数据点数目超过设定阈值的网格中所有数据点,最后在剩余的数据点集上执行LOF算法。实验结果显示,相对于固定网格的离群点检测算法,所提算法的执行效率明显提高并且检测精确度亦有所提高。
As a widely used outlier detecting algorithm,the LOF algorithm usually spends much time and space on the dealing with the large-scale dataset. The outlier detecting algorithm based on the stationary grid can solve the problems to some extent,but its implementation effect can be influenced by the granularity of grid division. Aiming at the problem,this paper proposes an outlier detecting algorithm based on the variable grid division. The proposed algorithm can dynamically construct the grid space according to the practical distribution of data points in space,then remove all of the data points in the grid when it contains the count of data points more than the threshold,finally execute the LOF algorithm in the remainder data points. The experimental results show that the proposed algorithm can receive a higher efficiency and accuracy compared with the outlier detecting algorithm based on stationary grid.
出处
《江南大学学报(自然科学版)》
CAS
2015年第6期751-757,共7页
Joural of Jiangnan University (Natural Science Edition)
基金
安徽省高校自然科学研究项目(KJ2014B24)