摘要
异常检测用来预处理数据,挖掘异类数据信息,是数据挖掘的一种重要方法。近年来由于维度灾难问题,高维异常数据检测显得十分困难,针对上述问题提出一种基于自编码器和集成学习的半监督异常检测算法。首先利用自编码器降维,在编解码过程中异常数据的异常程度被增大,然后在AdaBoost提升框架中融合iforest、LOF、K-means算法,基于3种算法对于不同异常类型的敏感性,提升异常检测的准确性。选取UCI机器学习库中的高维异常数据集进行实验。实验结果表明,该模型的准确性相较于目前主流的异常检测算法有显著提升。
Outlier detection is an important data mining method,which is used to preprocess data and mine heterogeneous data information.In recent years,due to the problem of dimension disaster,it is very difficult to detect the high-dimensional outlier data.Aiming at the above problems,a semi-supervised outlier detection model based on autoencoder and integrated learning is proposed.Firstly,autoencoder is used to reduce the dimension and increase the outlier degree of the outlier data.Secondly,considering that Iforest,lof and k-means algorithms are sensitive to different outlier types,they are fused in the AdaBoost boosting framework to improve the accuracy of outlier detection.The results show that,compared with the current mainstream outlier detection methods,the proposal significantly improves the accuracy of the model.
作者
夏火松
孙泽林
XIA Huo-song;SUN Ze-lin(School of Management,Wuhan Textile University,Wuhan 430073,China)
出处
《计算机工程与科学》
CSCD
北大核心
2020年第8期1440-1447,共8页
Computer Engineering & Science
基金
国家自然科学基金(71871172,71571139)。
关键词
异常检测
提升框架
半监督
自编码器
outlier detection
boosting framework
semi-supervised
autoencoder