期刊文献+

FINFLO:快速局部异常点检测算法 被引量:2

FINFLO:A New Fast Local Outliers Detection Algorithm
下载PDF
导出
摘要 【目的】基于局部密度的LOF算法时间复杂度高,且容易将处于簇边缘的正常对象误判成异常对象,INFLO算法引进反向k-近邻解决LOF算法这一缺陷,但是计算每个对象的局部异常因子时都会使用反向k-近邻没有必要且耗费时间。【方法】通过对两个算法的分析,本文改进了INFLO算法,提出了一种快速异常点检测算法FINFLO(faster Influenced outlierness),该算法的主要思想是:计算对象的局部因子时尽量避免考虑反向k-近邻对象,尽可能地只利用k-近邻对象。首先,计算出所有对象的反向k-近邻对象个数的均值,然后在计算对象的局部异常因子时,如果对象的反向k-近邻对象个数不小于所有对象的反向k-近邻对象个数均值,则只需要考虑对象的k-近邻对象,否则需要同时考虑k-近邻对象和反向k-近邻对象。【结论】实验结果显示,该算法能够提高离群点检测的精度,降低时间复杂度,实现有效的局部离群点的检测。 [Objective]Local density based LOF algorithm has high time complexity,and it tends to misjudge the normal objects at the edge of the cluster as exceptions.The inverse k-nearest-neighbor algorithm is introduced to solve the problem of LOF algorithm in INFLO algorithm.However,it is unnecessary and time-consuming to use the inverse k-nearest-neighbor when calculating the local outlier factor of each object.[Methods]Through the analysis of the two algorithms,this paper proposes a new fast anomaly detection algorithm,named Faster Influenced Outlierness,FINFLO.When calculating the local factors of objects,FINFLO tries to avoid considering reverse k-nearest neighbor objects,and use only k-nearest neighbor objects as much as possible.If the number of reverse k-nearest neighbor objects is not less than the mean of all reverse k-nearest neighbor objects,only k-nearest neighbor objects need to be considered,otherwise reverse k-nearest neighbor objects need to be considered.[Conclusions]Experimental results show that the algorithm can improve the accuracy of outlier detection,reduce the time complexity,and achieve effective local outlier detection.
作者 杨校林 李菁菁 李易 YANG Xiaolin;LI Jingjing;LI Yi(Computer Network Information Center,Chinese Academy of Sciences,Beijing 100190,China;University of Chinese Academy of Sciences,Beijing 100049,China)
出处 《数据与计算发展前沿》 2020年第6期82-89,共8页 Frontiers of Data & Computing
关键词 局部密度 异常因子 局部离群点 K-近邻 反向k-近邻 local density outlier factor local outlier k-nearest neighbor reverse k-nearest neighbor
  • 相关文献

参考文献7

二级参考文献50

  • 1薛安荣,鞠时光,何伟华,陈伟鹤.局部离群点挖掘算法研究[J].计算机学报,2007,30(8):1455-1463. 被引量:96
  • 2范明 等.数据挖掘概念与技术[M].北京:机械工业出版社,2001.. 被引量:120
  • 3Breunig M M,Kriagel H P,Ng R T,et al.LOF:identifying densitybased local outliers[C]//Proceedings of ACM SIGMOD International Conference on Management of Data,Dallas,Texas,USA,2000:93-104. 被引量:1
  • 4Xia Chen-yi,Hsu W,Lee M L,et al.BODER:efficient computation of boundary points[J].IEEE Transaction on Knowledge and Data Engineering,2006 (18). 被引量:1
  • 5Ertoz L,Steinbach M,Kumar V.Finding clusters of different sizes,shapes,and densities in noisy,high dimensional data[C]//2nd SIAM International Conference on Data Mining,SDM 2002 IAM Press:SDM2003,2003:1-12. 被引量:1
  • 6Hsu Chih-ming,Chen Ming-syan.Subspace clustering of high dimensional spatial data with noises[C]//Advances in Knowledge Discovery and Data Mining.8th Pacific-Asia Conference,PAKDD 2004.Heidlberg,Germany:Springer,2004:31-40. 被引量:1
  • 7Korn F,Muthukrishna S.Influence sets based on reverse nearest neighbors queries[C]//Proceedings of ACM SIGMOD,2000:201-212. 被引量:1
  • 8Han Jia-Wei,Kamber Micheline Data Mining:Concepts and Techniques (2nd Edition).San Francisco:Morgan Kaufmann Publishers,2006 被引量:1
  • 9Hawkins D.Identification of Outliers.London:Chapman and Hall,1980 被引量:1
  • 10Knorr E,Ng R.Algorithms for mining distance-based outliers in large datasets//Proceedings of the 24th VLDB Conference.New York,1998:392-403 被引量:1

共引文献231

同被引文献18

引证文献2

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部