摘要
针对二进制协议会话流特征缺失和频繁模式难以提取的问题,通过采用特征降维和改进的密度峰值聚类算法,实现了无监督条件下以数据帧为颗粒度的二进制协议数据聚类.提出基于频繁项的特征降维算法,利用协议数据中存在的频繁项构造特征矢量表示原有数据帧,达到降维的目的;提出基于距离指数加权的密度峰值聚类算法自动选取聚类中心,有效提高了聚类中心和其它数据帧的区分度.通过在AIS、ARP、DNS、ICMP和SMB五种协议构成的三个数据集上进行测试,结果表明本文提出的算法对二进制协议数据帧具有较好的聚类效果.
Aiming at the problem that session flow characteristics are missing and frequent patterns extracting is difficult for binary protocols,a clustering algorithm based on feature dimensionality reduction and improved density peaks clustering is proposed to achieve binary protocol data frames clustering under unsupervised condition. We propose feature dimensionality reduction based on frequent items,using the frequent items in protocol data to construct feature vectors to denote the original data frames. Meanwhile,we improve density peaks clustering based on distance index weighting. The improved density peaks clustering can select cluster centers automatically and enhance the distinction between cluster centers and other data frames effectively. We test our algorithm on three data sets consisting of AIS,ARP,DNS,ICMP and SMB. The experimental results show that our algorithm is effective on binary protocol data frames clustering.
作者
闫小勇
李青
YAN Xiao-yong;LI Qing(School of Information System Engineering,Information Engineering University,Zhengzhou 450000,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2018年第12期2662-2668,共7页
Journal of Chinese Computer Systems
关键词
协议识别
二进制协议
特征降维
密度峰值
帧聚类
protocol identification
binary protocol
feature dimensionality reduction
density peaks
frames clustering