摘要
针对传统基于机器学习的流量分类方法中数据不均衡影响分类效果的问题,提出了一种基于重采样的梯度增强树算法。该算法利用流量数据的统计特征,通过回溯搜索策略优化特征集合并设计适用于流量分类的树结构参数,构造最优模型;利用结合重采样的LightGBM算法修正数据不平衡性并进行分类测试。经实验验证,该算法提高了不平衡数据的分类效果,并且具有性能稳定、快速的优点。
Since the data imbalance affects the accuracy of the traffic classification based on machine learning,a traffic classification algorithm based on ensemble learning and resampling RES-LGBM is tailored.The algorithm uses statistical features of traffic flows,and optimizes the feature set by backtracking search method.After determination of optimal tree structure,the RES-LGBM is employed to eliminate the data imbalance and test the classification result.The test result shows that the algorithm enhances the classification of imbalanced data with high efficiency and stablility.
作者
顾兆军
吴优
赵春迪
周景贤
GU Zhaojun;WU You;ZHAO Chundi;ZHOU Jingxian(Information Security Evaluation Center of Civil Aviation,Civil Aviation University of China,Tianjin 300300,China;Sino-European Institute of Aeronautical Engineering,Civil Aviation University of China,Tianjin 300300,China;College of Computer Science and Technology,Civil Aviation University of China,Tianjin 300300,China)
出处
《计算机工程与应用》
CSCD
北大核心
2020年第6期86-91,共6页
Computer Engineering and Applications
基金
民航安全能力建设项目(No.PESA170003,No.PESA2018082)
中央高校基本科研业务费中国民航大学专项(No.3122018C036)
关键词
机器学习
集成学习
数据不平衡
网络流量
重采样
machine learning
ensemble learning
data imbalance
network flow
resampling