摘要
基于网站指纹(WF)攻击的Tor网页流量识别方法往往建立在分离好的Tor流量甚至是分离好的Tor网页流量的基础上,但从实际网络的原始流中分离出Tor流量,再从Tor流量中分离出Tor网页流量,其计算量和困难程度远高于Tor网页流量的WF攻击本身。根据目前互联网的体系结构,利用网络流量汇聚到区域中心节点的特点,通过中心节点的SDN结构所提供的域内全局视角,结合Tor网络公开的节点信息提出了一种区分Tor流量的双向统计特征(BSF),可以有效分离Tor流量;进而提出了一种基于LSF技术的网页流量隐藏特征提取方法,从而获得了基于BSF和LSF的复合流量特征(CTTF);在此基础上,针对当前Tor流量训练数据较少的问题,提出了一种基于平移的流量数据增强方法,使增强后的流量数据与真实工作环境中捕获的Tor流量数据分布尽量一致。实验结果表明,基于CTTF与仅使用原始数据特征相比,识别率提高了4%左右,在训练数据较少时,使用流量数据增强方法后分类效果提升更加明显,并且可以有效降低误报率。
Website fingerprinting(WF)methods for Tor webpage traffic are often based on the separated Tor traffic or even the separated Tor webpage traffic.However,distinguishing Tor traffic from the original traffic of the actual network and Tor webpage traffic from the Tor traffic costs amount of computation,which is more difficult than the WF attack it-self.According to the current architecture of the Internet and the characteristics of network traffic converging to regional central nodes,the bi-directional statistical feature(BSF)was proposed for distinguishing Tor traffic through the in-tra-domain global perspective provided by the SDN structure of the central node and the node information disclosed by the Tor network.Furthermore,a hidden feature extraction method for Web traffic based on lifted structure fingerprinting(LSF)was proposed,and a composited Tor-webpage-identification traffic feature(CTTF)was proposed based on BSF and LSF deep features.For solving the problem of traffic training data scarcity,a traffic data augmentation method based on translation was proposed,which made the augmented traffic data as consistent as the Tor traffic data captured in the real working environment.The experimental results show that the identification rate based on CTTF can be improved by about 4%compared with using only the original data features.When there is less training data,the classification accuracy is improved more obvious after using the traffic data augmentation method,and the false positive rate can be effectively reduced.
作者
言洪萍
周强
王世豪
姚旺
何刘坤
王良民
YAN Hongping;ZHOU Qiang;WANG Shihao;YAO Wang;HE Liukun;WANG Liangmin(School of Computer Science and Communication Engineering,Jiangsu University,Zhenjiang 212013,China)
出处
《通信学报》
EI
CSCD
北大核心
2022年第3期76-87,共12页
Journal on Communications
基金
国家自然科学基金资助项目(No.U1736216)。
关键词
流量发现
流量识别
统计特征
数据增强
traffic discovery
traffic classification
statistical feature
data augmentation