摘要
为提升钓鱼网页检测的准确率和效率,提出基于主辅特征的混合式深度学习模型。从URL、HTML页面内容和文档对象模型(document object model,DOM)结构中提取39种特征来表示钓鱼网页的多样性,其中包括两种新特征,基于信息增益将这39种特征根据重要程度分为主要特征和辅助特征;将两种特征向量通过不同通道分别送入由卷积神经网络和双向长短时记忆网络组成的混合式深度学习网络进行训练,对两通道的输出进行加权融合实现分类。实验结果表明,所提模型能有效地检测钓鱼网页。
To improve the accuracy and efficiency of phishing webpage detection,a hybrid deep learning model based on primary and secondary features was proposed.39 features were extracted from URL,HTML page content,and DOM(document object model)structure to represent the diversity of phishing webpages,including two new features.The importance of these features was measured by information gain and the features were divided into primary and secondary features.The two kinds of feature vectors were sent to the hybrid deep learning network consisting of convolutional neural network and bi-directional long-short term memory through different channels respectively for training.The output of the dual channels was weighted and fused to achieve classification.The results of experiments show that the proposed model can detect phishing webpages effectively.
作者
冯健
邹联扬
乔鱼强
叶鸥
FENG Jian;ZOU Lian-yang;QIAO Yu-qiang;YE Ou(College of Computer Science and Technology,Xi’an University of Science and Technology,Xi’an 710054,China)
出处
《计算机工程与设计》
北大核心
2021年第10期2748-2754,共7页
Computer Engineering and Design
基金
陕西省自然科学基础研究计划基金项目(2020JM-533)。
关键词
钓鱼网页
主辅特征
深度学习
双通道
加权融合
phishing webpage
primary and secondary features
deep learning
dual channel
weighted fusion