摘要
针对采用云流量混淆Meek机制的Tor匿名通信流量与其他普通云流量难于区分识别的问题,提出了基于流静态特征的Tor匿名通信识别方法和基于支持向量机SVM分类算法的Tor匿名通信识别方法。本文首先从连接特征分析、数据包静态特征分析以及数据流动态特征分析出发,通过对大量Tor-Meek通信流量以及非Tor-Meek通信流量的对比实验研究,确定了7个具有特异性和较强区分度的Tor-Meek通信流量的静态和动态流量征,然后在此基础之上提出了基于特征匹配算法的Tor-Meek匿名通信识别方法,该方法能够快速识别Tor-Meek通信流量,对于包含大于200个包的流识别准确率大于90%。为了进一步适应Tor的版本变化带来的特征改变,基于Meek流分片机制的数据流统计特征分析,分别从长度及个数、长度方差、长度熵、接收发送序列等4个方面,提出了识别Tor-Meek流的16种Tor-Meek流量统计特征,采用SVM分类算法对Tor-Meek流量进行识别,通过系统的实验研究不同特征组合、不同算法参数选择的算法识别准确率和召回率,筛选出最优的特征组合和参数。在实验室环境中搭建实验数据采集平台并采集Tor-Meek通信和其他通信数据进行实验,该算法对长度大于40个包Tor-Meek流的识别准确率大于97%,召回率大于99%,且具有较高的识别效率。实验结果表明,采用特征匹配可以实现对云流量混淆Tor匿名通信的快速识别,而基于流分片统计特征的分类算法对不同Tor通信软件版本的变化具有更高的稳定性和识别准确率。
In order to solve the problem of identifying the meek-based Tor anonymous traffic from the TLS-based cloud computing service traffic,an identification method for Tor's anonymous communication based on traffic feature matching and a classification method of Tor's anonymous traffic based on SVM were proposed. Firstly,based on the analysis of connection,static packet and dynamic traffic of the captured Tor-Meek and non Tor-meek traffic in the lab environment,seven specific static and dynamic features of Tor-Meek traffic were identified. Lately,a traffic feature matching identification method for Tor's anonymous communication technique was proposed,which could be used to quickly detect Tor-Meek traffic and the accuracy rate is over 90% for longer traffic with packets number exceeding 200. In order to be robust to the upgrading and transformation of Tor versions,statistic features of the slicing of Tor-Meek traffic were analyzed including the length and count,length variation,length entropy,sequence of sending and receiving of the sliced traffic.Then 16 statistic features were identified,based on which an identification and classification method for Tor's anonymous traffic by using SVM machine learning algorithm was proposed. Different feature combinations and algorithm parameters were studied experimentally to decide which ones can yield the best accuracy and recall rate of the classification algorithm. It was shown that when the number of packets in one session wa above 40,and the length of each slice of one session was 40 packets,the identification accuracy was above 97 and the recall rate was over 99% for the SVM based method. The experiments results show that while the feature matching methods is effective for quick identification,the machine learning method is more accurate and robust to the changing and upgrading of different versions of Tor browser in identifying anonymous traffic of specific versions of Tor-Meek.
出处
《工程科学与技术》
EI
CAS
CSCD
北大核心
2017年第2期121-132,共12页
Advanced Engineering Sciences
基金
国家自然科学基金资助项目(61402035)
关键词
匿名通信
TOR
流量混淆
流量识别
anonymous communication
Tor
traffic obfuscation
traffic identification