Recently, sequence anomaly detection has been widely used in many fields. Sequence data in these fields are usually multi-dimensional over the data stream. It is a challenge to design an anomaly detection method for a...Recently, sequence anomaly detection has been widely used in many fields. Sequence data in these fields are usually multi-dimensional over the data stream. It is a challenge to design an anomaly detection method for a multi-dimensional sequence over the data stream to satisfy the requirements of accuracy and high speed. It is because:(1) Redundant dimensions in sequence data and large state space lead to a poor ability for sequence modeling;(2) Anomaly detection cannot adapt to the high-speed nature of the data stream, especially when concept drift occurs, and it will reduce the detection rate. On one hand, most existing methods of sequence anomaly detection focus on the single-dimension sequence. On the other hand, some studies concerning multi-dimensional sequence concentrate mainly on the static database rather than the data stream. To improve the performance of anomaly detection for a multi-dimensional sequence over the data stream, we propose a novel unsupervised fast and accurate anomaly detection(FAAD) method which includes three algorithms. First, a method called "information calculation and minimum spanning tree cluster" is adopted to reduce redundant dimensions. Second, to speed up model construction and ensure the detection rate for the sequence over the data stream, we propose a method called"random sampling and subsequence partitioning based on the index probabilistic suffix tree." Last, the method called "anomaly buffer based on model dynamic adjustment" dramatically reduces the effects of concept drift in the data stream. FAAD is implemented on the streaming platform Storm to detect multi-dimensional log audit data.Compared with the existing anomaly detection methods, FAAD has a good performance in detection rate and speed without being affected by concept drift.展开更多
挖掘时态关联规则的目的是为了发现带有时态信息的项集之间有趣的关系.由于数据库经常动态更新,时态关联规则的挖掘也应该适应数据库的更新.然而,现有的大多数算法不仅需要重新挖掘更新的数据库,浪费了大量的时间和效率,而且不能利用已...挖掘时态关联规则的目的是为了发现带有时态信息的项集之间有趣的关系.由于数据库经常动态更新,时态关联规则的挖掘也应该适应数据库的更新.然而,现有的大多数算法不仅需要重新挖掘更新的数据库,浪费了大量的时间和效率,而且不能利用已存在的规则定量地预测某些项的变化趋势.本文提出了一个基于多维时态关联规则的演化模糊推理预测建模算法(Evolving fuzzy inference model based on multidimensional temporal association rules,EFI-MTAR),主要优势是构建了一种基于多维时态关联规则的模糊推理建模算法(Fuzzy inference modeling algorithm based on multidimensional temporal association rules,FI-MTAR),实现了对时间序列的定量预测.此外,为了降低规则更新的代价和加快规则预测的速度,提出了概念漂移检测策略来处理时间序列数据以适应数据库的动态更新.实验结果表明了本文提出算法的有效性和准确性.展开更多
基金Project supported by the National Key R&D Program of China(No.2016YFB1000101)the National Natural Science Foundation of China(Nos.61379052 and 61502513)+1 种基金the Natural Science Foundation for Distinguished Young Scholars of Hunan Province,China(No.14JJ1026)the Specialized Research Fund for the Doctoral Program of Higher Education,China(No.20124307110015)
文摘Recently, sequence anomaly detection has been widely used in many fields. Sequence data in these fields are usually multi-dimensional over the data stream. It is a challenge to design an anomaly detection method for a multi-dimensional sequence over the data stream to satisfy the requirements of accuracy and high speed. It is because:(1) Redundant dimensions in sequence data and large state space lead to a poor ability for sequence modeling;(2) Anomaly detection cannot adapt to the high-speed nature of the data stream, especially when concept drift occurs, and it will reduce the detection rate. On one hand, most existing methods of sequence anomaly detection focus on the single-dimension sequence. On the other hand, some studies concerning multi-dimensional sequence concentrate mainly on the static database rather than the data stream. To improve the performance of anomaly detection for a multi-dimensional sequence over the data stream, we propose a novel unsupervised fast and accurate anomaly detection(FAAD) method which includes three algorithms. First, a method called "information calculation and minimum spanning tree cluster" is adopted to reduce redundant dimensions. Second, to speed up model construction and ensure the detection rate for the sequence over the data stream, we propose a method called"random sampling and subsequence partitioning based on the index probabilistic suffix tree." Last, the method called "anomaly buffer based on model dynamic adjustment" dramatically reduces the effects of concept drift in the data stream. FAAD is implemented on the streaming platform Storm to detect multi-dimensional log audit data.Compared with the existing anomaly detection methods, FAAD has a good performance in detection rate and speed without being affected by concept drift.
文摘挖掘时态关联规则的目的是为了发现带有时态信息的项集之间有趣的关系.由于数据库经常动态更新,时态关联规则的挖掘也应该适应数据库的更新.然而,现有的大多数算法不仅需要重新挖掘更新的数据库,浪费了大量的时间和效率,而且不能利用已存在的规则定量地预测某些项的变化趋势.本文提出了一个基于多维时态关联规则的演化模糊推理预测建模算法(Evolving fuzzy inference model based on multidimensional temporal association rules,EFI-MTAR),主要优势是构建了一种基于多维时态关联规则的模糊推理建模算法(Fuzzy inference modeling algorithm based on multidimensional temporal association rules,FI-MTAR),实现了对时间序列的定量预测.此外,为了降低规则更新的代价和加快规则预测的速度,提出了概念漂移检测策略来处理时间序列数据以适应数据库的动态更新.实验结果表明了本文提出算法的有效性和准确性.