摘要
针对现有的挖掘算法并不适用于大规模的高维数据集的问题,给出了一种针对高维数据集的RBRP算法,能够快速检测出数据集中基于距离的异常,该算法将对数线性作为数据点个数的函数,线性作为维数的函数。实验结果表明,RBRP算法始终优于ORCA算法,且是一种针对高维数据集的最优的基于距离的异常检测算法,并且RBRP算法的优势往往超过ORCA算法一个数量级。
The existing algorithms for mining distance-based outliers are not suitable for large and high-dimensional data sets.Based on the problem,this paper presents RBRP,a fast algorithm for mining distance-based outliers.T he algorithm takes log-linear as a function of the number of data points,and linear as a function of dimension.Experimental results show that it is better than ORCA algorithm and its advantages often exceed the ORCA algorithm by an order of magnitude.
作者
乔天成
QIAO Tian-cheng(Taiyuan Station of the First Communication Station, PLA Army General Staff, Taiyuan 030012 China)
出处
《科技创新与生产力》
2017年第11期67-71,共5页
Sci-tech Innovation and Productivity
关键词
数据挖掘
算法
离群
高维数据集
近似K-近邻
聚类
data mining
algorithm
outlier
high-dimensional data sets
approximate k-nearest neighbors
clustering