期刊文献+

基于改进的旋转森林算法的不平衡网络流量分类方法 被引量:1

Imbalanced network traffic classification method based on improved forest rotation algorithm
下载PDF
导出
摘要 针对不平衡网络流量分类精度不高的问题,在旋转森林算法的基础上结合Bagging算法的Bootstrap抽样和基于分类精度排序的基分类器选择算法,提出一种改进的旋转森林算法。首先,对原始训练集按特征进行子集划分并分别使用Bagging进行样本抽样,通过主成分分析(PCA)生成主成分系数矩阵;然后,在原始训练集和主成分系数矩阵的基础上进行特征转换,生成新的训练子集,再次使用Bagging对子集进行抽样,提升训练集的差异性,并使用训练子集训练C4.5基分类器;最后,使用测试集评价基分类器,依据总体分类精度进行排序筛选,保留分类精度较高的分类器并生成一致分类结果。在不平衡网络流量数据集上进行测试实验,依据准确率和召回率两个标准对C4.5、Bagging、旋转森林和改进的旋转森林四种算法评价,依据模型训练时间和测试时间评价四种算法的时间效率。实验结果表明改进的旋转森林算法对万维网(WWW)协议、Mail协议、Attack协议、对等网(P2P)协议的分类准确度达到99.5%以上,召回率也高于旋转森林、Bagging、C4.5三种算法,可用于网络入侵取证、维护网络安全、提升网络服务质量。 Aiming at the problem of not high accuracy of the unbalanced network traffic classification, on the basis of rotation forest algorithm, an improved rotation forest algorithm by combining the Bootstrap sampling of Bagging algorithm and the base classifier selection algorithm based on sorting of accuracy was proposed. Firstly, the subset was divided from the original training set according to the characteristics, the Bagging was used for sampling, and the coefficient matrix of principal components was computed by Principal Component Analysis( PCA). Then, features of subset were converted based on the original training set and coefficient matrix of principal components to generate new training subsets. In order to enhance the difference of training set and train base classifier of C4. 5 by the training subset, the Bagging was used again for sampling subsets. Finally, the testing set was used to evaluate the base classifiers, and the classifiers were sorted and filtered by the overall classification accuracy. The classifiers with high accuracy were chosen to generate consistent classifier results. The imbalanced network traffic data set was chosen for the test experiment, and the precision and recall were used for evaluating the classifiers of C4. 5, Bagging, rotation forest and the improved rotation forest. The time efficiency of the four algorithms were evaluated by the training time and testing time of models. The experimental results show that, the classification accuracy of the improved rotation forest algorithm is above 99. 5% on the protocols of World Wide Web( WWW), Mail, Attack, Peerto-Peer( P2P), and the recall rate is also higher than rotation forest, Bagging and C4. 5. The proposed algorithm can be used for network intrusion forensics, maintaining network security and improving the quality of network service.
作者 丁要军
出处 《计算机应用》 CSCD 北大核心 2015年第12期3348-3351,共4页 journal of Computer Applications
基金 甘肃政法学院重点基金资助项目(GZF2014XZDLW15)
关键词 主成分分析 集成学习 不平衡网络流量 旋转森林 决策树 Principal Component Analysis(PCA) ensemble learning imbalanced network traffic rotation forest decision tree
  • 相关文献

参考文献12

二级参考文献51

  • 1唐伟,周志华.基于Bagging的选择性聚类集成[J].软件学报,2005,16(4):496-502. 被引量:95
  • 2Estivill-Castro V. Why so many clustering algorithms-A position paper. SIGKDD Explorations, 2002,4(1):65-75. 被引量:1
  • 3Dietterich TG. Machine learning research: Four current directions. AI Magazine, 1997,18(4):97-136. 被引量:1
  • 4Breiman L. Bagging predicators. Machine Learning, 1996,24(2):123-140. 被引量:1
  • 5Zhou ZH, Wu J, Tang W. Ensembling neural networks: Many could be better than all. Artificial Intelligence, 2002,137(1-2):239-263. 被引量:1
  • 6Strehl A, Ghosh J. Cluster ensembles-A knowledge reuse framework for combining partitionings. In: Dechter R, Kearns M,Sutton R, eds. Proc. of the 18th National Conf. on Artificial Intelligence. Menlo Park: AAAI Press, 2002. 93-98. 被引量:1
  • 7MacQueen JB. Some methods for classification and analysis of multivariate observations. In: LeCam LM, Neyman J, eds. Proc. of the 5th Berkeley Symp. on Mathematical Statistics and Probability. Berkeley: University of California Press, 1967,1:281-297. 被引量:1
  • 8Blake C, Keogh E, Merz CJ. UCI Repository of machine learning databases. Irvine: Department of Information and Computer Science, University of California, 1998. http://www.ics.uci.edu/~mlearn/MLRepository.html 被引量:1
  • 9Modha DS, Spangler WS. Feature weighting in k-means clustering. Machine Learning, 2003,52(3):217-237. 被引量:1
  • 10Zhou ZH, Tang W. Clusterer ensemble. Technical Report, Nanjing: AI Lab., Department of Computer Science & Technology,Nanjing University, 2002. 被引量:1

共引文献147

同被引文献18

引证文献1

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部