摘要
近年来,以僵尸网络为载体的各种网络攻击活动是目前互联网面临的安全威胁之一,各种恶意软件使用域名生成算法(domain generation algorithm,DGA)自动生成大量伪随机域名以连接到命令和控制服务器.为此提出以基于卷积神经网络(CNN)的方法来检测和分类伪随机域名.简要介绍了僵尸网络的危害、基本原理以及假冒域名在僵尸网络中的作用.在分析DGA算法的原理以及传统的DGA域名识别算法的缺陷以后,将重点放在基于卷积神经网络的假冒域名识别方法研究.阐述了关于卷积神经网络的基本概念,模拟了在不同的超参数,不同的激励函数下模型对于解决分类问题效果的差异.分析了数据预处理的原理、模型定义中对于超参数和激励函数、学习速率等选择的合理性.在模型运行结果分析时,给出了卷积神经网络模型识别域名的准确率和损失函数的变化,使用准确率、召回值、F1值、ROC曲线等评估指标,各项指标均显示模型取得了优秀的分类效果,证明了基于CNN的假冒域名识别是一个可靠的方法.
In recent years,various cyber attacks based on botnets have been one of the cyber security threats.Various malwares use the Domain Generation Algorithm(DGA)to automatically generate a large number of pseudo-random domain names to connect to commands and control servers.The detection and classification of pseudo-random domain names based on the convolutional neural network(CNN)method is focused on.A brief introduction is given to the hazards,basic principles of botnets,and the role of fake domain names in botnets.After analyzing the principle of DGA algorithm and the defects of traditional DGA domain name recognition algorithm,emphasis is laid on the research of fake domain name recognition method based on convolutional neural network.The basic concept of convolutional neural network is expounded by simple neural network training experiments.The differences of the model’s effect on solving classification problems under different hyperparameters and different excitation functions are simulated.In the analysis of the model operation results,the accuracy and loss function of the domain name identification by the convolutional neural network model are given,and the evaluation indexes of the accuracy,recall,F1 and ROC curves are printed out.All indicators show that the classification of the model is good.It is concluded that counterfeit domain name recognition based on CNN is a reliable method.
作者
杜淑颖
杜鹏
丁世飞
DU Shuying;DU Peng;DING Shifei(School of Computer Science and Technology,China University of Mining and Technology,Xuzhou 221116,China;School of information management,Xuzhou Vocational College of Bioengineering,Xuzhou 221000,China)
基金
国家自然科学基金(61976216,61672522)资助
关键词
域名生成算法
混合词向量
深度学习
卷积神经网络
domain generation algorithm(DGA)
word embedding
deep learning
convolutional neural network(CNN)