摘要
由于恶意软件的数量日渐庞大,攻击手段不断更新,结合机器学习技术是恶意软件检测发展的一个新方向。先简要介绍恶意软件检测中的静态检测方法以及动态检测方法,总结基于机器学习的恶意软件检测一般流程,回顾了研究进展。通过使用Ember 2017和Ember 2018数据集,分析验证了结构化特征相关方法,包括随机森林(Random Forest,RF)、LightGBM、支持向量机(Support Vector Machine,SVM)、K-means以及卷积神经网络(Convolutional Neural Network,CNN)等算法模型;使用收集的2019年样本集分析验证了序列化特征相关方法,包括几种常见的深度学习算法模型。计算模型以在不同测试集上的准确率、精确率、召回率以及F1-值作为评估指标。根据实验结果分析讨论了各类方法的优缺点,着重验证分析了树模型的泛化能力,表明随着样本的不断演变,模型普遍存在退化问题,并指出进一步研究方向。
Due to the increasing number of malware and the updated attack means,malware detection combined with machine learning technology is a new direction of its development.Firstly,this paper introduces the static detecting methods and dynamic detecting methods of malware briefly;summarizes the general process of malware detecting methods based on machine learning,and reviews the existing methods with research progress.Using the data sets of Ember 2017 and Ember 2018,the structural feature correlation methods,including RF(Random Forest),LightGBM,SVM(Support Vector Machine),K-means and CNN(Convolutional Neural Network),are analyzed and validated,and the 2019 sample set analysis is used to validate the serialization feature correlation method,including several common deep learning algorithm models.The accuracy,precision,recall and F1_score of the trained model on different testing data sets are calculated as evaluating metrics.According to the experimental results,the advantages and disadvantages of various methods are discussed in this paper,the generalization ability of the tree model is verified and analyzed emphatically.It is shown that the model generally has degradation problem with the continuous evolution of samples,and the further research direction is pointed out at last.
作者
景鸿理
黄娜
李建国
Jing Hongli;Huang Na;Li Jianguo(Beijing Topsec Science&Technology Inc.,Beijing 100085,China;Beijing University of Technology,Beijing 100124,China)
出处
《信息技术与网络安全》
2020年第11期38-44,68,共8页
Information Technology and Network Security