摘要
网络数据的正确分类对于网络环境的监控和维护具有重要作用。在数据不平衡状态下解决数据分类和处理复杂的特征关系尤为重要,为此提出一种改进SMOTE(synthetic minority over-sampling technique)+GA-XGBoost(genetic algorithm-extreme gradient boosting)的机器学习分类方法。将局部离群因子引入SMOTE插值过程,对少数类样本过采样,并对多数类样本随机欠采样,从而实现样本再平衡;同时,在模型训练过程中为增加模型拟合度,将具有进化迭代优势的遗传算法与XGBoost相结合,解决XGBoost参数众多、特征学习收敛较慢等问题。实验采用UNSW_NB15数据集,选择多层感知机、K近邻、决策树等机器学习算法及SMOTE+XGBoost等不平衡数据训练方法进行试验对比,结果表明该方法具有较好的分类预测准确率(97.40%)及较高的平均召回率(70.2%)和平均F1-score(68.8%)。并在本实验室工业信息安全平台采集的数据进行实验研究,分类准确率为99%,进一步验证了该方法的有效性和可行性。
The correct classification of network data plays an important role in the monitoring and maintenance of network environment.It is particularly important to solve the problem of data classification and deal with complex feature relations under the condition of data imbalance.Therefore,a machine learning classification method based on improved SMOTE(synthetic minority over-sampling technique)+GA-XGBoost(genetic algorithm-extreme gradient boosting)was proposed.The local outlier factor was introduced into the SMOTE interpolation process to over sample a few samples and under sample a lot of samples at random,so as to achieve sample rebalancing.At the same time,in order to increase the model fitting degree in the model training process,the genetic algorithm with the advantage of evolutionary iteration was combined with XGBoost to solve the problems of many XGBoost parameters and slow convergence of feature learning.UNSW_NB15 was used in the experiment dataset,machine learning algorithms such as multi-layer perceptron,K-nearest neighbor,decision tree and unbalanced data training methods such as SMOTE+XGBoost were selected for experimental comparison.The results show that the method has good classification prediction accuracy(97.40%),high average recall rate(70.2%)and average F1score(68.8%).The data collected on the industrial information security platform of our laboratory are tested,and the classification accuracy is 99%,which further verifies the effectiveness and feasibility of this method.
作者
韩凤董
宗学军
何戡
连莲
HAN Feng-dong;ZONG Xue-jun;HE Kan;LIAN Lian(School of Information Engineering,Shenyang University of Chemical Technology,Shenyang 110142,China;Liaoning Key Laboratory of Information Security in Petrochemical Industry,Shenyang 110142,China)
出处
《科学技术与工程》
北大核心
2023年第3期1130-1137,共8页
Science Technology and Engineering
基金
辽宁省“兴辽英才计划”(XLYC2002085)。