摘要
针对目标网络疑似入侵数据存在大量高维和冗余特征,而现有入侵检测方法仅定性选取特征,导致入侵检测率低、误报率高、实时性差的问题,提出基于改进遗传算法的网络疑似入侵最优数据选取方法。采用半监督学习算法对归一化处理后的数据进行自动标记以获取更大规模的网络疑似入侵数据,将其作为入侵检测模型的训练数据集;采用重采样算法从训练数据集中随机选取一个训练数据子集,计算训练数据子集中疑似入侵数据特征的信息增益率,选取信息增益率最大的特征构造有效疑似入侵数据特征集;采用偏F检验对特征进一步选取,构建待优化疑似入侵数据特征集,利用改进的遗传算法对待优化特征集进行优化选择,选取出最能反应入侵状态的数据集。实验结果表明,所提方法在确保入侵检测率、误报率尽可能低的前提下,有效提高了检测效率。
The suspected intrusion data of the target network has a large quantity of high-dimensional and redundant fea-tures,and the current intrusion detection method can only select features qualitatively,resulting in problems of low intrusion de-tection rate,high false alarm rate,and poor real-time performance.Therefore,an optimal data selection method based on the im-proved genetic algorithm is proposed for suspected network intrusion.The semi-supervised learning algorithm is used to automati-cally mark the normalized processing data,so as to obtain a large scale of suspected network intrusion data,which is taken as the training data set of the intrusion detection model.The re-sampling algorithm is adopted to randomly select a training data sub-set from the training data set.The information gain rates of suspected intrusion data features in the training data subset are calcu-lated.The features with the highest information gain rates are selected to construct the suspected valid intrusion data feature set.The partial F-detection is adopted to further select features,so as to construct the to-be optimized feature set of suspected intru-sion data.The improved genetic algorithm is used to optimize the selection of the to-be optimized feature set,so as to select out the data set that can best reflect the intrusion state.The experimental results show that the proposed method can effectively im-prove the detection efficiency on the premise of ensuring the intrusion detection rate and false alarm rate as low as possible.
作者
熊云龙
XIONG Yunlong(Guizhou University of Finance and Economics,Guiyang 550025,China)
出处
《现代电子技术》
北大核心
2018年第22期163-165,169,共4页
Modern Electronics Technique
关键词
遗传算法
网络疑似入侵
重采样
入侵检测
数据集
优化选择
genetic algorithm
suspected network intrusion
re-sampling
intrusion detection
data set
optimization selection