期刊文献+

融合特征降维和密度峰值的二进制协议数据帧聚类算法

Clustering Algorithm for Binary Protocol Data Frames Combining Feature Dimensionality Reduction and Density Peaks Clustering
下载PDF
导出
摘要 针对二进制协议会话流特征缺失和频繁模式难以提取的问题,通过采用特征降维和改进的密度峰值聚类算法,实现了无监督条件下以数据帧为颗粒度的二进制协议数据聚类.提出基于频繁项的特征降维算法,利用协议数据中存在的频繁项构造特征矢量表示原有数据帧,达到降维的目的;提出基于距离指数加权的密度峰值聚类算法自动选取聚类中心,有效提高了聚类中心和其它数据帧的区分度.通过在AIS、ARP、DNS、ICMP和SMB五种协议构成的三个数据集上进行测试,结果表明本文提出的算法对二进制协议数据帧具有较好的聚类效果. Aiming at the problem that session flow characteristics are missing and frequent patterns extracting is difficult for binary protocols,a clustering algorithm based on feature dimensionality reduction and improved density peaks clustering is proposed to achieve binary protocol data frames clustering under unsupervised condition. We propose feature dimensionality reduction based on frequent items,using the frequent items in protocol data to construct feature vectors to denote the original data frames. Meanwhile,we improve density peaks clustering based on distance index weighting. The improved density peaks clustering can select cluster centers automatically and enhance the distinction between cluster centers and other data frames effectively. We test our algorithm on three data sets consisting of AIS,ARP,DNS,ICMP and SMB. The experimental results show that our algorithm is effective on binary protocol data frames clustering.
作者 闫小勇 李青 YAN Xiao-yong;LI Qing(School of Information System Engineering,Information Engineering University,Zhengzhou 450000,China)
出处 《小型微型计算机系统》 CSCD 北大核心 2018年第12期2662-2668,共7页 Journal of Chinese Computer Systems
关键词 协议识别 二进制协议 特征降维 密度峰值 帧聚类 protocol identification binary protocol feature dimensionality reduction density peaks frames clustering
  • 相关文献

参考文献5

二级参考文献99

  • 1赵咏,姚秋林,张志斌,郭莉,方滨兴.TPCAD:一种文本类多协议特征自动发现方法[J].通信学报,2009,30(S1):28-35. 被引量:10
  • 2李洁,高新波,焦李成.基于特征加权的模糊聚类新算法[J].电子学报,2006,34(1):89-92. 被引量:114
  • 3金婷,王攀,张顺颐,陆青莲,陈东.基于DPI和会话关联技术的QQ语音业务识别模型和算法[J].重庆邮电学院学报(自然科学版),2006,18(6):789-792. 被引量:10
  • 4THOMAS K, ANDRE B, NEVIL B. File-sharing in the Intemet: a Characterization of P2P Traffic in the Backbone[R]. UC, Riverside, 2003. 被引量:1
  • 5SUBHABRATA S, OLIVER S, WANG D M. Accurate, scalable in network identification of P2P traffic using application signatures[A]. International World Wide Web Conference[C]. New York,2004. 被引量:1
  • 6KARAGIANNIS T, PAPAGIANNAKI K, FALOUTSOS M. BLINC: multilevel tratfic classification in the dark[A]. Proc of ACM SIGCOMM[C]. Philadelphia, PA, 2005. 被引量:1
  • 7KARAGIANNIS T, BROIDO A, FALOUTSOS M. Transport layer identification of P2P traffic[A]. Proc of ACM SIGCOMM IMC[C]. Taormina, Sicily, Italy, 2004. 被引量:1
  • 8ZANDER S, NGUYENI T, ARMITAGEI G.Self-learning IP traffic classification based on statistical flow characteristics[A]. Proc of PAM[C]. Boston, MA, 2005. 被引量:1
  • 9ZUEV D, MOORE A W. Traffic classification using a statistical approach[A]. Proc of PAM[C]. Boston, 2005. 被引量:1
  • 10HERN E NOBEL A B, SMITH F D. Statistical clustering of intemet communication patterns[A]. Proceedings of the 35th Symposium on the Interface of Computing Science and Statistics, Computing Science and Statistics[C]. 2003. 被引量:1

共引文献1211

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部