Efficient Feature Extraction Using Apache Spark for Network Behavior Anomaly Detection 被引量：2

Efficient Feature Extraction Using Apache Spark for Network Behavior Anomaly Detection

导出

摘要 Extracting and analyzing network traffic feature is fundamental in the design and implementation of network behavior anomaly detection methods. The traditional network traffic feature method focuses on the statistical features of traffic volume. However, this approach is not sufficient to reflect the communication pattern features. A different approach is required to detect anomalous behaviors that do not exhibit traffic volume changes, such as low-intensity anomalous behaviors caused by Denial of Service/Distributed Denial of Service （DoS/DDoS） attacks, Internet worms and scanning, and BotNets. We propose an efficient traffic feature extraction architecture based on our proposed approach, which combines the benefit of traffic volume features and network communication pattern features. This method can detect low-intensity anomalous network behaviors and conventional traffic volume anomalies. We implemented our approach on Spark Streaming and validated our feature set using labelled real-world dataset collected from the Sichuan University campus network. Our results demonstrate that the traffic feature extraction approach is efficient in detecting both traffic variations and communication structure changes. Based on our evaluation of the MIT-DRAPA dataset, the same detection approach utilizes traffic volume features with detection precision of 82.3% and communication pattern features with detection precision of 89.9%. Our proposed feature set improves precision by 94%. Extracting and analyzing network traffic feature is fundamental in the design and implementation of network behavior anomaly detection methods. The traditional network traffic feature method focuses on the statistical features of traffic volume. However, this approach is not sufficient to reflect the communication pattern features. A different approach is required to detect anomalous behaviors that do not exhibit traffic volume changes, such as low-intensity anomalous behaviors caused by Denial of Service/Distributed Denial of Service （DoS/DDoS） attacks, Internet worms and scanning, and BotNets. We propose an efficient traffic feature extraction architecture based on our proposed approach, which combines the benefit of traffic volume features and network communication pattern features. This method can detect low-intensity anomalous network behaviors and conventional traffic volume anomalies. We implemented our approach on Spark Streaming and validated our feature set using labelled real-world dataset collected from the Sichuan University campus network. Our results demonstrate that the traffic feature extraction approach is efficient in detecting both traffic variations and communication structure changes. Based on our evaluation of the MIT-DRAPA dataset, the same detection approach utilizes traffic volume features with detection precision of 82.3% and communication pattern features with detection precision of 89.9%. Our proposed feature set improves precision by 94%.

作者 Xiaoming Ye Xingshu Chen Dunhu Liu Wenxian Wang Li Yang Gang Liang Guolin Shao

机构地区 School of Cybersecurity College of Cybersecurity School of Management College of Compute Science

出处《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2018年第5期561-573,共13页 清华大学学报（自然科学版（英文版）

基金 supported by the National Natural Science Foundation of China (No. 61272447) Sichuan Province Science and Technology Planning (Nos. 2016GZ0042, 16ZHSF0483, and 2017GZ0168) Key Research Project of Sichuan Provincial Department of Education (Nos. 17ZA0238 and 17ZA0200) Scientific Research Staring Foundation for Young Teachers of Sichuan University (No. 2015SCU11079)

关键词 feature extraction graph theory network behavior anomaly detection Apache Spark feature extraction graph theory network behavior anomaly detection Apache Spark

分类号 TP393.08 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献1

1谭骏,陈兴蜀,杜敏,朱锴.A novel internet traffic identification approach using wavelet packet decomposition and neural network[J].Journal of Central South University,2012,19(8):2218-2230. 被引量：6

二级参考文献30

1AZZOUNA N B, GUILLEMIN F. Impact of peer-to-peer applications on wide area network traffic [C]//Proc of IEEE Global Telecommunications Conference. Texas: IEEE, 2004: 1544-1548. 被引量：1
2KIM J T, PARK H K, PAIK E H. Security issues in peer-to-peer systems [C]// The 7th International Conference on Advanced Communications Technology. Phoenix Park: IEEE, 2005: 1059- 1063. 被引量：1
3SEN S, WANG J. Analyzing peer-to-peer traffic across large networks [J]. IEEE Trans on Networking, 2004, 2(2): 219-231. 被引量：1
4SEN S, SPATSCHECK O, WANG Dong-mei. Accurate, scalable in-network identification of P2P traffic using application signatures [C]// Proceedings of ACM WWW'04. New York: ACM, 2004: 512-521. 被引量：1
5ERMAN J, MAHANTI A, ARLITT M, COHEN I, WILLIAMSON C. Semi-supervised network traffic classification [C]// Proceedings of the ACM SIGMETRICS, New York: ACM, 2007: 369-370. 被引量：1
6ZANDER S, NGUYEN T, ARMITAGE G. Self-learning IP traffic classification based on statistical flow characteristics [J]. Passive and Active Network Measurement, 2005, 3431: 325-328. 被引量：1
7MOORE A W, ZUEV D. Intemet traffic classification using bayesian analysis techniques [C]// Proceedings of the ACM SIGMETRICS. 2005: 50-60. 被引量：1
8MOORE A W, PAPAGIANNAKI K. Towards the accurate identification of network applications [C]// Passive and Active Measurements Workshop. Boston, MA, USA, Springer Press, 2005: 41-54. 被引量：1
9HAFFNER P, SEN S, SPATSCHECK O, WANG Dong-mei. ACAS: automated construction of application signatures [C]// Proceedings of the 2005 ACM SIGCOMM Workshop on Mining Network Data. ACM, New York, 2005: 192-202. 被引量：1
10KARAGIANNIS T, BROIDO A, FALOUTSOS M, CLAFFY K. Transport layer identification of P2P traffic [C]// Proceedings of the 4th ACM SIGCOMM conference on Internet Measurement. Sicily, USA: ACM, 2004: 121-134. 被引量：1

共引文献5

1杜敏,陈兴蜀,谭骏.Online Internet Traffic Identification Algorithm Based on Multistage Classifier[J].China Communications,2013,10(2):89-97. 被引量：3
2彭朝琴,曹纯,黄姣英,刘秋生.Seismic signal recognition using improved BP neural network and combined feature extraction method[J].Journal of Central South University,2014,21(5):1898-1906. 被引量：1
3李全善,李大字,曹柳林.Modeling and optimum operating conditions for FCCU using artificial neural network[J].Journal of Central South University,2015,22(4):1342-1349. 被引量：6
4叶晓鸣,陈兴蜀,杨力,王文贤,朱毅,邵国林,梁刚.基于图演化事件的主机群异常检测模型[J].山东大学学报（理学版）,2018,53(9):1-11. 被引量：1
5杨挺,侯昱丞,赵黎媛,盆海波,原凯,宋毅.基于时-频域混合特征的变电站通信网异常流量检测方法[J].电力系统自动化,2020,44(16):79-86. 被引量：46

同被引文献9

1范振东,陈晖,王海涛,胡强,何柳.基于大数据的智慧校园学生综合测评系统[J].电信快报（网络与通信）,2018(11):25-27. 被引量：3
2周庆,王卫芳,葛亮,肖逸枫,唐代.基于一卡通数据与课程分类的学生成绩预测[J].电脑知识与技术,2018,14(8X):236-239. 被引量：3
3Tongya ZHENG,Gang CHEN,Xinyu WANG,Chun CHEN,Xingen WANG,Sihui LUO.Real-time intelligent big data processing:technology, platform, and applications[J].Science China(Information Sciences),2019,62(8):98-109. 被引量：8
4王继鹏,金云智,李伟.勘探开发数据整合之ETL系统的研究与实现[J].中国矿业,2019,28(A02):191-194. 被引量：4
5常镜洳.基于大数据的智能工厂数据平台架构设计与研究[J].软件工程,2019,22(12):34-36. 被引量：13
6Chen Zhang,Jieren Cheng,Xiangyan Tang,Victor SSheng,Zhe Dong,Junqi Li.Novel DDoS Feature Representation Model Combining Deep Belief Network and Canonical Correlation Analysis[J].Computers, Materials & Continua,2019(8):657-675. 被引量：2
7黄婕.基于Spark平台的FP-Growth算法优化与实现[J].湖南工业大学学报,2020,34(1):77-84. 被引量：2
8Hira Zahid,Tariq Mahmood,Ahsan Morshed,Timos Sellis.Big Data Analytics in Telecommunications: Literature Review and Architecture Recommendations[J].IEEE/CAA Journal of Automatica Sinica,2020,7(1):18-38. 被引量：6
9Fang Dong,Xiaolin Guo,Pengcheng Zhou,Dian Shen.Task-Aware Flow Scheduling with Heterogeneous Utility Characteristics for Data Center Networks[J].Tsinghua Science and Technology,2019,24(4):400-411. 被引量：2

引证文献2

1茆灵铖,谢桂芳,邵周伟,时海茹,蒋秀莲.基于大数据的高校智慧校园学生综合测评系统设计与研究[J].软件工程,2020,23(5):43-45. 被引量：14
2丑义凡,易波,王兴伟,贾杰,黄敏.IPv6网络中基于MF-DL的DDoS攻击快速防御机制[J].计算机学报,2021,44(10):2047-2060. 被引量：16

二级引证文献30

1蔡创.高职院校困难生认定与资助管理系统的设计与实现[J].电子技术与软件工程,2020(9):48-52.
2禹云.大数据时代背景下的高校学生综合测评系统设计与实现[J].科技创新与应用,2020(30):37-38. 被引量：4
3聂娟.“互联网+”背景下智慧校园建设模式研究[J].信息通信,2020(7):165-166. 被引量：1
4段蔓.大数据环境下智慧校园的设计与实现[J].电脑编程技巧与维护,2021(2):133-134.
5许沥文,王默玉,申晓留.基于标签体系的高校学生立体画像研究[J].长江信息通信,2021(3):155-158.
6周易欣,刘禹佳,杜轶男.物联网技术环境下的智慧校园系统设计[J].电子技术与软件工程,2021(6):146-147. 被引量：5
7焦鹏,姚瑶.面向大数据智慧化校园系统的设计和实现[J].电子元器件与信息技术,2021,5(7):55-56. 被引量：1
8武海舰.多业务通信网络恶意攻击防御系统[J].信息与电脑,2022,34(2):211-213.
9杨致国.广电有线网络IPv6平滑过渡策略及技术路线分析[J].通信电源技术,2022,39(9):116-118.
10张睿.高校智慧校园大数据一体化平台的研究与实践[J].科技经济市场,2022(10):28-30.

1夏海蛟,谭毅华.一种面向识别的无监督特征学习算法[J].计算机工程与科学,2018,40(6):1103-1110. 被引量：2
2鲁高法.群主疏于管理当心担责[J].百姓生活,2018,0(9):43-43.
3Jin-Fa Wang,Xiao Liu,Hai Zhao,Xing-Chi Chen.Anomaly Detection of Complex Networks Based on Intuitionistic Fuzzy Set Ensemble[J].Chinese Physics Letters,2018,35(5):156-160. 被引量：1
4郭韩英.高校校园无线局域网架构与网络行为管理探究[J].无线互联科技,2018,15(13):130-133. 被引量：1
5乎西旦.居马洪,古丽娜孜.艾力木江.分块重排向量二维局部保持鉴别方法在人脸识别中的应用[J].伊犁师范学院学报（自然科学版）,2018,12(2):68-72. 被引量：1
6李涵,连齐才,王晶.对大型游乐综合项目交通咨询的思考[J].物流工程与管理,2018,40(6):124-125.
7唐存琛,毕翔.国内外网络舆情分析比较研究[J].西南民族大学学报（人文社会科学版）,2018,39(9):141-147. 被引量：8
8刘慧,赵荣彩,王琦.监督学习模型指导的函数级编译优化参数选择方法研究[J].计算机工程与科学,2018,40(6):957-968. 被引量：6
9肖斌,温嘉欣,赵超然,陈丽丹,孙朝晖,李林海.Luminal亚型乳腺癌细胞与正常乳腺细胞的circRNA表达谱差异分析[J].南方医科大学学报,2018,38(8):1014-1019. 被引量：4
10TIAN Xuhua,CHEN Ke’an,LI Han,YANG Lixue,LIU Yang.The admittance feature representation and impact sound feature extraction in the material identification of ribbed plates[J].Chinese Journal of Acoustics,2018,37(3):275-290.

Tsinghua Science and Technology

2018年第5期

浏览历史

内容加载中请稍等...