期刊文献+

基于自然邻居邻域图的无参数离群检测算法 被引量:6

A parameter-free outlier detection algorithm based on natural neighborhood graph
下载PDF
导出
摘要 数据挖掘领域,基于最近邻居思想的离群检测算法在面对复杂数据时,很难在没有足够先验知识条件下进行适当的参数选择。为了解决这个问题,本文在自然邻居方法的基础上,提出一种利用加权自然邻居邻域图进行离群检测的算法。该算法在整个过程不需要人为设置参数,并且能在不同分布特征的数据中准确找到数据集中的全局离群点和局部离群点。人工数据集和真实数据的离群检测结果均证明,本算法能够取得和有参数的算法中最优参数相近的效果,算法检测结果远好于对参数敏感算法的大部分情况,且更优于对参数不敏感的算法,具有更强的普适性和实用性。 This study aims to deal with the practical shortages of nearest-neighbor-based data mining techniques,particularly outlier detection.In particular,when data sets have arbitrarily shaped clusters and varying density,determining the appropriate parameters without a priori knowledge becomes difficult.To address this issue,on the basis of the natural neighbor method,which can better reflect the relationship between elements in a data set than the k-nearest neighbor method,we present a graph called the weighted natural neighborhood graph for outlier detection.The weighted natural neighborhood graph does not need to set parameters artificially in the entire process and can identify global and local outliers in the data set with different distribution characteristics.The outlier detection results of artificial dataset and real data prove that the algorithm can obtain an effect similar to that of the optimal parameter in the algorithm with parameters.The algorithm detection result is far better than that of most parameter-sensitive algorithms and is much better than that of the parameter-insensitive algorithm,which has stronger universality and more practicality.
作者 冯骥 冉瑞生 魏延 FENG Ji;RAN Ruisheng;WEI Yan(College of Computer and Information Science,Chongqing Normal University,Chongqing 401331,China)
出处 《智能系统学报》 CSCD 北大核心 2019年第5期998-1006,共9页 CAAI Transactions on Intelligent Systems
基金 教育部人文社会科学研究项目(18XJC880002) 重庆市教委科技项目(KJQN201800539) 重庆市自然科学基金项目(cstc2013jcyjA40049) 重庆师范大学基金项目(17XLB003)
关键词 无参数 自适应 最近邻居 加权图 离群检测 离群因子 全局离群点 局部离群点 parameter-free adaptive neighbor nearest neighbor weighted graph outlier detection outlier factor globaloutlier local outlier
  • 相关文献

参考文献8

二级参考文献54

  • 1梁俊杰,冯玉才.LBD:基于局部位码比较的高维空间KNN搜索算法[J].计算机科学,2007,34(6):145-148. 被引量:3
  • 2薛安荣,鞠时光,何伟华,陈伟鹤.局部离群点挖掘算法研究[J].计算机学报,2007,30(8):1455-1463. 被引量:96
  • 3Sun JG, Liu J, Zhao LY. Clustering algorithms research. Ruan Jian Xue Bao/Joumal of Software, 2008,19(1):48-61 (in Chinese with English abstract), http://www.jos.org.cn/1000-9825/19/48.htm [doi: 10.3724/SP.J.1001.2008.00048]. 被引量:1
  • 4Zhu M. Introduction to Data Mining. Hefei: Press of University of Science and Technology of China, 2002. 138-139 (in Chinese). 被引量:1
  • 5Jain AK, Dubes RC. Algorithms for Clustering Data. Prentice-Hall, Inc., 1988. 1-334. 被引量:1
  • 6Gelbard R, Goldman O, Spiegler I. Investigating diversity of clustering methods: An empirical comparison. Data & Knowledge Engineering, 2007,63(1): 155-166* [doi: 10.1016/j.datak.2007.01.002]. 被引量:1
  • 7MacQueen J. Some methods for classification and analysis of multivariate observations. In: Proc. of the 5th Berkeley Symp. on Mathematical Statistics and Probability. 1967. 281-297. 被引量:1
  • 8Lloyd S. Least squares quantization in PCM. IEEE Trans, on Information Theory, 1982,28(2):129-137. [10.1109/TIT.1982.10564 89]. 被引量:1
  • 9Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y. Robust face recognition via sparse representation. IEEE Trans, on Pattern Analysis and Machine Intelligence, 2009,31(2):210-227. [doi: 10.1109/TPAMI.2008.79]. 被引量:1
  • 10Wu JX. Balance support vector machines locally using the structural similarity kernel. In: Proc. of the Pacific-Asia Conf. on Knowledge Discovery and Data Mining. 2011. 112-123. [doi: 10.1007/978-3-642-20841-6 10]. 被引量:1

共引文献64

同被引文献39

引证文献6

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部