期刊文献+

一种基于可变网格划分的离群点检测算法 被引量:1

An Outlier Detecting Algorithm Based on the Variable Grid Division
下载PDF
导出
摘要 LOF(Local Outlier Factor)算法是常用的离群点检测算法,但是该算法在面对大规模数据集时往往需要高昂的时空开销,基于固定网格的离群点检测算法虽然在一定程度上可以解决该问题,但是它的执行效果易受到网格划分粒度的影响。对此提出一种基于可变网格划分的离群点检测算法。该算法首先根据数据点在空间的实际分布情况来动态构建与原始数据集分布大体一致的网格空间,然后删除网格中数据点数目超过设定阈值的网格中所有数据点,最后在剩余的数据点集上执行LOF算法。实验结果显示,相对于固定网格的离群点检测算法,所提算法的执行效率明显提高并且检测精确度亦有所提高。 As a widely used outlier detecting algorithm,the LOF algorithm usually spends much time and space on the dealing with the large-scale dataset. The outlier detecting algorithm based on the stationary grid can solve the problems to some extent,but its implementation effect can be influenced by the granularity of grid division. Aiming at the problem,this paper proposes an outlier detecting algorithm based on the variable grid division. The proposed algorithm can dynamically construct the grid space according to the practical distribution of data points in space,then remove all of the data points in the grid when it contains the count of data points more than the threshold,finally execute the LOF algorithm in the remainder data points. The experimental results show that the proposed algorithm can receive a higher efficiency and accuracy compared with the outlier detecting algorithm based on stationary grid.
出处 《江南大学学报(自然科学版)》 CAS 2015年第6期751-757,共7页 Joural of Jiangnan University (Natural Science Edition) 
基金 安徽省高校自然科学研究项目(KJ2014B24)
关键词 局部离群因子 离群点检测 可变网格 大规模数据集 local outlier factor outlier detection variable grid large-scale dataset
  • 相关文献

参考文献16

二级参考文献72

  • 1张建锦,吴渝,刘小霞.一种改进的密度偏差抽样算法[J].计算机应用,2007,27(7):1695-1698. 被引量:6
  • 2薛安荣,鞠时光,何伟华,陈伟鹤.局部离群点挖掘算法研究[J].计算机学报,2007,30(8):1455-1463. 被引量:96
  • 3Hinneburg A, Aggarwal C C, Keim D A. What is the nearest neighbor in high dimensional spaces[C]//26^th VLDB Conference. 2000 : 506-515. 被引量:1
  • 4Kriegel H-P, Kroger P, Zimek A. Clustering high-dimensional data: a Survey on subspace clustering, pattern-based clustering, and correlation clustering[J]. ACM Transactions on Knowledge Discovery from Data,2009,3(1) : 1-58. 被引量:1
  • 5盛骤 谢式千 潘承毅.概率论与数理统计[M].北京:高等教育出版社,1989.. 被引量:209
  • 6Han Jia-Wei,Kamber Micheline Data Mining:Concepts and Techniques (2nd Edition).San Francisco:Morgan Kaufmann Publishers,2006 被引量:1
  • 7Hawkins D.Identification of Outliers.London:Chapman and Hall,1980 被引量:1
  • 8Knorr E,Ng R.Algorithms for mining distance-based outliers in large datasets//Proceedings of the 24th VLDB Conference.New York,1998:392-403 被引量:1
  • 9Breunig M M,Kriegel H P,Ng R T et al.OPTICS-OF:Identifying local outliers//Proceedings of the 3rd European Conference on Principles and Practice of Knowledge Discovery in Databases.Prague,1999:262-270 被引量:1
  • 10Breunig M,Knegel H P,Ng R et al.LOF:Identifying density-based local outliers//Proceedings of ACM SIGMOD Conference.Dallas,Texas,2000:93-104 被引量:1

共引文献152

同被引文献7

引证文献1

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部