摘要
【目的】基于局部密度的LOF算法时间复杂度高,且容易将处于簇边缘的正常对象误判成异常对象,INFLO算法引进反向k-近邻解决LOF算法这一缺陷,但是计算每个对象的局部异常因子时都会使用反向k-近邻没有必要且耗费时间。【方法】通过对两个算法的分析,本文改进了INFLO算法,提出了一种快速异常点检测算法FINFLO(faster Influenced outlierness),该算法的主要思想是:计算对象的局部因子时尽量避免考虑反向k-近邻对象,尽可能地只利用k-近邻对象。首先,计算出所有对象的反向k-近邻对象个数的均值,然后在计算对象的局部异常因子时,如果对象的反向k-近邻对象个数不小于所有对象的反向k-近邻对象个数均值,则只需要考虑对象的k-近邻对象,否则需要同时考虑k-近邻对象和反向k-近邻对象。【结论】实验结果显示,该算法能够提高离群点检测的精度,降低时间复杂度,实现有效的局部离群点的检测。
[Objective]Local density based LOF algorithm has high time complexity,and it tends to misjudge the normal objects at the edge of the cluster as exceptions.The inverse k-nearest-neighbor algorithm is introduced to solve the problem of LOF algorithm in INFLO algorithm.However,it is unnecessary and time-consuming to use the inverse k-nearest-neighbor when calculating the local outlier factor of each object.[Methods]Through the analysis of the two algorithms,this paper proposes a new fast anomaly detection algorithm,named Faster Influenced Outlierness,FINFLO.When calculating the local factors of objects,FINFLO tries to avoid considering reverse k-nearest neighbor objects,and use only k-nearest neighbor objects as much as possible.If the number of reverse k-nearest neighbor objects is not less than the mean of all reverse k-nearest neighbor objects,only k-nearest neighbor objects need to be considered,otherwise reverse k-nearest neighbor objects need to be considered.[Conclusions]Experimental results show that the algorithm can improve the accuracy of outlier detection,reduce the time complexity,and achieve effective local outlier detection.
作者
杨校林
李菁菁
李易
YANG Xiaolin;LI Jingjing;LI Yi(Computer Network Information Center,Chinese Academy of Sciences,Beijing 100190,China;University of Chinese Academy of Sciences,Beijing 100049,China)
出处
《数据与计算发展前沿》
2020年第6期82-89,共8页
Frontiers of Data & Computing
关键词
局部密度
异常因子
局部离群点
K-近邻
反向k-近邻
local density
outlier factor
local outlier
k-nearest neighbor
reverse k-nearest neighbor