摘要
针对使用域名生成算法(DGA)僵尸网络隐蔽性强,传统检测算法特征提取复杂的问题,提出一种无需提取具体特征的深度学习模型DGA域名检测方法.首先基于word-hashing将所有域名转用二元语法字符串表示,利用词袋模型把域名映射到高维向量空间.然后利用5层深度神经网络对转换为高维向量的域名进行训练分类检测.通过深度模型,能够从训练数据中发现不同层次抽象的隐藏模式和特征,而这些模式和特征使用传统的统计方法大多是无法发现的.实验中使用了10万条DGA域名和10万条合法域名作为样本,与基于自然语言特征分类算法进行对比实验.实验结果表明该深度模型对DGA域名检测准确率达到97.23%,比基于自然语言特征分类算法得到的检测准确率高3.7%.
To solve the problem of botnet concealment using domain generation algorithm( DGA)and feature extraction difficulty of traditional detection algorithms,a DGA domain name detection model without extracting specific features is proposed based on deep learning. First,all the domain name strings are extracted to bigram strings based on word-hashing and bag-of-words model maps the domain names to a high-dimensional vector space. Then,the domain names converted into high-dimensional vectors are classified by a 5-layer depth neural network. Through the depth of the model structure,different levels of abstract hidden patterns and features are found from the training data,and these patterns and features mostly can not be discovered by traditional statistical methods. In the experiment,100 000 DGA domain names and 100 000 legal domain names are used as samples,compared with the natural language feature classification algorithm. The experimental results show that the accuracy rate of the DGA domain name is 97. 23%,it is 3. 7% higher than that of the natural language feature classification algorithm.
出处
《东南大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2017年第A01期30-33,共4页
Journal of Southeast University:Natural Science Edition
基金
赛尔网络下一代互联网技术创新资助项目(NGII20150412)