摘要
基于网格的数据分析方法以网格为单位处理数据,避免了数据对象点对点的计算,极大提高了数据分析的效率。但是,传统基于网格的方法在数据分析过程中独立处理网格,忽略了网格之间的耦合关系,影响了分析的精确度。在应用网格检测数据流异常的过程中不再独立处理网格,而是考虑了网格之间的耦合关系,提出了一种基于网格耦合的数据流异常检测算法GCStream-OD。该算法通过网格耦合精确地表达了数据流对象之间的相关性,并通过剪枝策略提高算法的效率。在5个真实数据集上的实验结果表明,GCStream-OD算法具有较高的异常检测质量和效率。
The grid-based data analysis method processes data in units of grids,avoiding the point-to-point calculation of data objects and greatly improving the efficiency of data analysis.However,the traditional grid-based method processes the grid independently in the analysis process,ignoring the coupling relationship between the grids and resulting in unsatisfactory analysis accuracy.In this paper,the grids are no longer processed independently and the coupling relationship between grids are considered,when the grids are used to detect outliers in data stream.A grid coupling based outliers detection algorithm for data streams(GCStream-OD)is proposed.The algorithm exactly expresses the correlation between data stream objects through grid coupling,and improves the efficiency of the algorithm through pruning strategy.Experimental results on five real data streams show that GCStream-OD has higher quality and efficiency of outliers detection.
作者
杨杰
张东月
周丽华
黄皓
丁海燕
YANG Jie;ZHANG Dong-yue;ZHOU Li-hua;HUANG Hao;DING Hai-yan(School of Information Science&Engineering,Yunnan University,Kunming 650504,China)
出处
《计算机工程与科学》
CSCD
北大核心
2020年第1期25-35,共11页
Computer Engineering & Science
基金
国家自然科学基金(61762090,61966036,61662086)
云南省自然科学基金(2016FA026)
云南省创新研究团队项目(2018HC019)
国家社会科学基金(18XZZ005)
云南省高等学校科技创新团队项目(IRTSTYN)
关键词
异常检测
数据流
网格耦合
outliers detection
data stream
grid coupling