期刊文献+

一种基于统计频率的网络流量特征选择方法 被引量:3

Feature Selection Method Based on Statistic Frequency in Network Traffic Classification
下载PDF
导出
摘要 在对多类不均衡的网络流量进行分类时,基于机器学习的分类模型倾向于多数类,导致少数类召回率较低.针对该问题,提出一种基于统计频率的特征选择方法.该方法首先根据样本的统计频率计算出度量每个特征区分能力的特征选择系数,然后根据特征选择系数构建特征选择矩阵,最后为每个类选择与之相关性较强的特征.在实验阶段,使用该方法选择的特征对多类不均衡的网络流量进行分类获得了较高的整体准确率、少数类召回率和g-mean值,证明该方法可以减轻多类不均衡问题带来的不良影响. In the process of classffy, ing multi-class imbalanced Internet traffic, classification models based on machine learning algo- rithms are biased to majority classes, which leading to low recalls of minority classes. To solve this problem, a new feature selection method based on statistic frequency is proposed. In this method, the feature selection coefficient which indicates the distinguishing abil- ity of the feature is calculated according to the samples' statistic frequency ,and then the feature selection matrix is constructed accord- ing to the coefficients. Finally,the features which have a strong correlation with specific class arc selected. In the experimental stage,the classification result with features selected through this method has a better integrated performance on overall accuracy,l-ecalls of minority classes and g-mean, which proves that this method can reduce the adverse effects caused by the multi-class imbalance problem.
作者 孙兴斌 芮赟
出处 《小型微型计算机系统》 CSCD 北大核心 2016年第11期2483-2487,共5页 Journal of Chinese Computer Systems
基金 国家自然科学青年基金项目(61302093)资助 上海市科委重大项目(14511101505)资助 上海市科委院市合作专项(13DZ1511200)资助 中科院重点部署项目(KGZW-EW-103)资助 东南大学移动通信国家重点实验室开放研究基金项目(2013D07)资助
关键词 网络流量分类 多类不均衡 统计频率 特征选择 interact traffic classification multi-class imbalance statistic frequency feature selection
  • 相关文献

参考文献1

二级参考文献10

  • 1Cover TM1. The best two independent measurements are not the two best [ J]. IEEE Transactions on Systems,Man, and Cybernet- ics, 1974,1:116-1171. 被引量:1
  • 2CacheLogic[ EB/OL]. http ://www. cachelogic, corn, 2008-01. 被引量:1
  • 3Hyunchul Kim, KC Claffy, Marina Fomenkov, et al. Internot traf- fic classification demystified: myths, caveats, and the best prac- tices [C]. Proceedings of the 2008 ACM CoNEXT Conference, December 09-12, 2008:1-12. 被引量:1
  • 4Moore A W, Zuev Z. lnternet traffic classification using Bayesian analysis techniques [C]. Proceedings of ACM SIGMETRICS, Banff, Canada, June,2005:50-60. 被引量:1
  • 5Wimams N, Zander S, Annitage G. A preHraJnary performance comparison of five machine learning algorithms for practical ip traf- fic flow classification[J]. ACM SIGCOMM CCR,October,2006, 36(5) :7-15. 被引量:1
  • 6Yang Yue-xiang, Wang Rui-wang, Liu Yang, et al. Solving P2P traffic identification problems via optimized support vector ma- chines [ C]. Proceeding of IEEF_/ACS International Conference on Computer Systems and Applications ( aieesa ), Amman, Jordan, 2007 : 165-171. 被引量:1
  • 7Zander S, Nguyen T, Armitagc G. Self-learning IP traffic classifi- cation based on statistical flow characteristics [ C ]. PAM 2005, Boston, USA, March 31-April 1, 2005. 被引量:1
  • 8Li Zhu, Yuan Rui-xi, Guan Xiao-hong. Accurate classification of the Intermet traffic based on the SVM method CC]. Proceeding of the 42th ]FEE International Cortference on Communication (ICC), Glasgow, Scfland, 2007 : 1373-1378. 被引量:1
  • 9Moore A, Zuev D, Crogan M. Discriminators for use in flow- based classification[ R ]. RR-05. 13, Department of Computer Sci- ence, University of London, 2005. 被引量:1
  • 10WEKA: data mining software in Java[ EB/OL]. http://www, cs. waikato, ac. nz/ml/weka/,2010-01. 被引量:1

共引文献6

同被引文献12

引证文献3

二级引证文献27

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部