摘要
提出网页恶意脚本代码的监测系统,将恶意脚本用V8引擎编译成机器码,用变长N-Gram模型对其进行数据处理,提取特征形成样本训练集.分别与随机森林、逻辑回归及朴素贝叶斯等分类器组合创建分类模型.研究将多个经过训练集训练的分类模型集成,提出加权分类器集成的方式,每个分类器设定不同权值.通过实验分析,试验多种分类器组合,并通过训练集找出最优权值分配.通过比较单个分类器和其他集成方式,结果证明训练过的加权集成分类器的方式能更准确地检测网页存在恶意行为的代码,有较高的准确率.
This paper proposes a static monitoring system for detecting malicious script code.In this system,malicious script code is compiled into machine code with V8 engines and the N-Gram model is used to process the machine code and the features are extracted to form the sample training set.The classification model is created by combining with random forest,logistic regression,Naive Bayes classifier respectively.In this paper,classification models which are trained in multiple training sets are integrated and a way of weighted classifier integration is proposed.Each classifier is set a different weights.Through experimental analysis,a variety of classifiers are combined and tested,and the optimal weight is found through the training set.assignment.By comparing the individual classifier and other integrated approach,the results show that the trained weighted classifiers can be more accurate to detect malicious codeand have higher accuracy.
出处
《浙江工业大学学报》
CAS
北大核心
2017年第6期604-609,633,共7页
Journal of Zhejiang University of Technology