摘要
提出一种基于内存计算引擎(Spark)日志集成与模糊c均值聚类-全连接神经网络(FCM-DNN)的流量分析算法.首先,使用Spark集成会话日志来获取可分析的结构化数据;然后,对同一网站的行为数据进行聚类,提取网站的多类簇特征集合,以解决单个会话连接特征维度较少、特征相似且不平衡的问题;最后,构建全连接神经网络(DNN),将统一化后的聚类特征与原始特征结合并进行训练,从聚类分组长度和损失函数等多个方面进行算法优化.仿真实验结果表明,对于特征较少的会话日志数据,该算法可有效提高网站分类的准确性.同时,在保留学生上网特征的前提下,将日志压缩约7000倍,从而节省存储开销.
A novel traffic analysis algorithm leveraging Spark log integration and fuzzy c-means and deep neural network(FCM-DNN)is proposed.Firstly,the method employs Spark to consolidate session logs,yielding structured and analyzable data.Subsequently,clustering is applied to group behavior data from the same website,thereby extracting a multi-cluster feature collection.This approach mitigates issues stemming from insufficient connection feature dimensions and imbalanced and similar features of a single session.Finally,the method constructs a deep neural network(DNN)network and combine the unified cluster features with original features for training,optimizing the algorithm across various dimensions,such as cluster grouping length and loss functions.Simulation experiments demonstrate that,even with session log data containing fewer features,our algorithm significantly enhances website classification accuracy and reduces storage overhead by compressing logs by a factor of about 7000,while retaining critical student online features.
作者
李腾
郭晓东
胡宇鹏
李振
LI Teng;GUO Xiaodong;HU Yupeng;LI Zhen(Informatization Office,Shandong University,Jinan,Shandong 250100,China;School of Software,Shandong University,Jinan,Shandong 250101,China)
出处
《福州大学学报(自然科学版)》
CAS
北大核心
2023年第5期677-683,共7页
Journal of Fuzhou University(Natural Science Edition)
基金
国家自然科学基金资助项目(62276155)
山东省自然科学基金资助项目(ZR2021MF040)。