提出类别属性数据流数据离群度量——加权频繁模式离群因子(weighted frequent pattern outlier factor,简称WFPOF),并在此基础上给出一种快速数据流离群点检测算法FODFP-Stream(fast outlier detection for high dimensional categoric...提出类别属性数据流数据离群度量——加权频繁模式离群因子(weighted frequent pattern outlier factor,简称WFPOF),并在此基础上给出一种快速数据流离群点检测算法FODFP-Stream(fast outlier detection for high dimensional categorical data streams based on frequent pattern).该算法通过动态发现和维护频繁模式来计算离群度,能够有效地处理高维类别属性数据流,并可进一步扩展到数值属性和混合属性数据流.对仿真数据集和真实数据集的实验检测均验证该算法具有良好的适用性和有效性.展开更多
离群点检测是数据挖掘领域的一个重要分支,当前数据流的离群点检测研究越来越受到关注.为了快速准确地检测出数据流中离群点,提出一种在线数据流离群点检测算法ODDS(outlier detection in online data stream s).它利用数据与频繁模式...离群点检测是数据挖掘领域的一个重要分支,当前数据流的离群点检测研究越来越受到关注.为了快速准确地检测出数据流中离群点,提出一种在线数据流离群点检测算法ODDS(outlier detection in online data stream s).它利用数据与频繁模式的相异程度来度量数据的离群程度,通过构建ODDS-Tree树,能动态地更新数据流中候选离群点的离群信息.实验结果验证了该算法与其他同类算法相比具有较高的效率与优良的可扩展性能.展开更多
The paper introduces the approach of outher detection based on kernel,and points out the approach is hard to be realized with the increase of sample number. In order to reduce the size of optimization,the sample selec...The paper introduces the approach of outher detection based on kernel,and points out the approach is hard to be realized with the increase of sample number. In order to reduce the size of optimization,the sample selection method based on distance is proposed Through selecting samples,the computation workload and the demand of EMS memory can be decreased greatly. In the end,the real-time demand can be met.展开更多
The distance-based outlier detection method detects the implied outliers by calculating the distance of the points in the dataset, but the computational complexity is particularly high when processing multidimensional...The distance-based outlier detection method detects the implied outliers by calculating the distance of the points in the dataset, but the computational complexity is particularly high when processing multidimensional datasets. In addition, the traditional outlier detection method does not consider the frequency of subsets occurrence, thus, the detected outliers do not fit the definition of outliers (i.e., rarely appearing). The pattern mining-based outlier detection approaches have solved this problem, but the importance of each pattern is not taken into account in outlier detection process, so the detected outliers cannot truly reflect some actual situation. Aimed at these problems, a two-phase minimal weighted rare pattern mining-based outlier detection approach, called MWRPM-Outlier, is proposed to effectively detect outliers on the weight data stream. In particular, a method called MWRPM is proposed in the pattern mining phase to fast mine the minimal weighted rare patterns, and then two deviation factors are defined in outlier detection phase to measure the abnormal degree of each transaction on the weight data stream. Experimental results show that the proposed MWRPM-Outlier approach has excellent performance in outlier detection and MWRPM approach outperforms in weighted rare pattern mining.展开更多
文摘提出类别属性数据流数据离群度量——加权频繁模式离群因子(weighted frequent pattern outlier factor,简称WFPOF),并在此基础上给出一种快速数据流离群点检测算法FODFP-Stream(fast outlier detection for high dimensional categorical data streams based on frequent pattern).该算法通过动态发现和维护频繁模式来计算离群度,能够有效地处理高维类别属性数据流,并可进一步扩展到数值属性和混合属性数据流.对仿真数据集和真实数据集的实验检测均验证该算法具有良好的适用性和有效性.
文摘离群点检测是数据挖掘领域的一个重要分支,当前数据流的离群点检测研究越来越受到关注.为了快速准确地检测出数据流中离群点,提出一种在线数据流离群点检测算法ODDS(outlier detection in online data stream s).它利用数据与频繁模式的相异程度来度量数据的离群程度,通过构建ODDS-Tree树,能动态地更新数据流中候选离群点的离群信息.实验结果验证了该算法与其他同类算法相比具有较高的效率与优良的可扩展性能.
文摘The paper introduces the approach of outher detection based on kernel,and points out the approach is hard to be realized with the increase of sample number. In order to reduce the size of optimization,the sample selection method based on distance is proposed Through selecting samples,the computation workload and the demand of EMS memory can be decreased greatly. In the end,the real-time demand can be met.
基金supported by Fundamental Research Funds for the Central Universities (No. 2018XD004)
文摘The distance-based outlier detection method detects the implied outliers by calculating the distance of the points in the dataset, but the computational complexity is particularly high when processing multidimensional datasets. In addition, the traditional outlier detection method does not consider the frequency of subsets occurrence, thus, the detected outliers do not fit the definition of outliers (i.e., rarely appearing). The pattern mining-based outlier detection approaches have solved this problem, but the importance of each pattern is not taken into account in outlier detection process, so the detected outliers cannot truly reflect some actual situation. Aimed at these problems, a two-phase minimal weighted rare pattern mining-based outlier detection approach, called MWRPM-Outlier, is proposed to effectively detect outliers on the weight data stream. In particular, a method called MWRPM is proposed in the pattern mining phase to fast mine the minimal weighted rare patterns, and then two deviation factors are defined in outlier detection phase to measure the abnormal degree of each transaction on the weight data stream. Experimental results show that the proposed MWRPM-Outlier approach has excellent performance in outlier detection and MWRPM approach outperforms in weighted rare pattern mining.