摘要
采用适当的特征选择算法缩减网络流量数据集的特征规模,对于提高基于机器学习的网络入侵检测系统的性能有着重要的意义.设计了一个以入侵检测性能保证为目标的特征选择算法评估方案,并从选择效果和时间消耗两个方面对比分析了多种常用特征选择算法在不同数据集上的应用效果.实验结果表明,KDD CUP99和NSL-KDD等数据集的冗余特征项接近于50%;基于L1和L2正则项的最小二乘回归特征选择算法(LS_L1)的鲁棒性最强;在特征选择算法中结合特征项之间的关联关系挖掘算法会更有利于提高网络入侵检测系统的性能.这些实验结果对如何为不同应用场景的网络入侵检测系统选择合适的特征选择算法有着明显的指导意义.
It is important for the network intrusion detection system based on machine learning to improve its performance by employing a suitable feature selection method.This paper proposes an evaluation scheme which aims at guaranteeing the performance of intrusion detection system,and analyzes the efficiency of several popular feature selection methods applied on different datasets in terms of selection effect and time consuming.The result shows that the redundancy featureitems of KDD CUP99 and NSL-KDD datasets are close to 50%.The LS_L1 algorithm based on least square regression with L1 and L2 regularizations has the best robustness.Combining with the association mining algorithm of feature items in the process of feature selection algorithm can improve the performance of network intrusion detection system.These results have obvious guiding significance for selecting a suitable feature selection method in different network intrusion detection scenes.
作者
田野
唐菀
杨喜敏
张艳
TIAN Ye;TANG Wan;YANG Ximin;ZHANG Yan(College of Computer Science,South-Central University for Nationalities,Wuhan 430074,China)
出处
《河南科学》
2021年第3期359-365,共7页
Henan Science
基金
国家自然科学基金项目(61902437)
湖北省自然科学基金项目(2020CFB629)。
关键词
网络入侵检测
特征选择
机器学习
对比研究
network intrusion detection
feature selection
machine learning
comparative research