面向海量病毒样本家族聚类方法的研究

Research on familial clustering of massive malware samples

下载PDF

导出

摘要计算机反病毒厂商每天接收成千上万的病毒样本,如何快速有效地将这些海量样本家族化是一个亟待解决的问题。提出了一种可伸缩性的聚类方法,面对输入海量的病毒样本向量化特征集,使用局部敏感哈希索引技术进行初次快速聚类,使用扩展K均值算法进行二次细致聚类。实验表明该聚类方法在有限牺牲准确度的情况下,大为提高了病毒聚类的时间效率。 Anti-malware companies receive thousands of malware samples every day, so it becomes more and more pressing to handle these samples timely and effectively. A scalable clustering approach is proposed to group these massive malware samples. LSH algorithm is used to cluster samples rapidly. Extended K-means algorithm is employed to perform accurately clustering. Experimental results show that this approach can improve malware clustering efficiency observably at the cost of little accuracy.

作者赵跃华林聚伟

机构地区江苏大学计算机科学与通信工程学院

出处《计算机工程与应用》 CSCD 2014年第18期118-121,共4页 Computer Engineering and Applications

关键词病毒家族可伸缩性聚类局部敏感哈希扩展K均值 malware family scalable clustering Locality Sensitive Hash（LSH）algorithm extended K-means

分类号 TP309.5 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献9

1Haveliwala T H, Gionis A, Indyk EScalable techniques for clustering the web[C]//WebDB(Information Proceedings), 2000 : 129-134. 被引量：1
2Bailey M,Oberheide J,Andersen J, et al.Automated classi- fication and analysis of internet malware[C]//Proceedings of the 10th International Symposium on Recent Anvances in Intrusion Detection, 2007: 178-197. 被引量：1
3Bayer U,Comparetti P M,Hlauschek C,et al.Scalable, behavior-based malware clustering[C]//Proceedings of the Network and Distributed System Security Symposium, 2009. 被引量：1
4Weber R, Schek H,Blott S.A quantitative analysis and performance study for similarity search methods in high dimensional spaces[C]//Proceedings of the 24th Intl Conf on Very Large Data Bases(VLDB), 1998: 194-205. 被引量：1
5Indyk P, Motwani R.Approximate nearest neighbors: towards removing the curse of dimensionality[C]//Jeffrey V.Proc of the 30th Annual ACM Syrup on Theory of Computing. New York: ACM Press, 1988 : 604-613. 被引量：1
6蔡衡,李舟军,孙健,李洋.基于LSH的中文文本快速检索[J].计算机科学,2009,36(8):201-204. 被引量：13
7卢炎生,饶祺.一种LSH索引的自动参数调整方法[J].华中科技大学学报（自然科学版）,2006,34(11):38-40. 被引量：6
8TanPangning,SteinbachM,KumarV数据挖掘导论[M].范明,范宏建,译.北京:人民邮电出版社,2006. 被引量：2
9朱明旱,罗大庸,易励群.一种广义的主成分分析特征提取方法[J].计算机工程与应用,2008,44(26):38-40. 被引量：11

二级参考文献26

1王立威,王潇,常明,封举富.关于二维主成分分析方法的研究[J].自动化学报,2005,31(5):782-787. 被引量：7
2陈伏兵,陈秀宏,王文胜,杨静宇.人脸识别中PCA方法的推广[J].计算机工程与应用,2005,41(34):34-38. 被引量：9
3高全学,潘泉,梁彦,张洪才,程咏梅.基于描述特征的人脸识别研究[J].自动化学报,2006,32(3):386-392. 被引量：13
4Stein B. Principles of hash - based text retrieval [C]//Annual ACM Conference on Research and Development in Information Retrieval Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2007. 被引量：1
5Athitsos V,Potamias M,Papapetrou P,et al. Nearest Neighbor Retrieval Using Distanee-Based Hashing[C] // Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on. 2008. 被引量：1
6IndykP, DatarM, ImmorlicaN. Locality-SensitiveHashingScheme Based on p-Stable[C]//Annual Symposium on Computational Geometry. 2004. 被引量：1
7Arya S, Mount D. Ann: Library for approximate nearest neighbor search[OL], http: //www. cs. umd. edu/-mount/ANN/. 被引量：1
8Indyk P, Motwani R. Approximate nearest neighbors : Towards removing the curse of dimensionality[C]//Jeffrey V, ed. Proc. of the 30th Annual ACM Symp. on Theory of Computing. New York: ACM Press, 1998 : 604-613. 被引量：1
9Panigrahy R. Entropy based nearest neighbor searchin high dimensions[C]//Proc, of ACM-SIAMSymposium on Discrete Algorithms(SODA). 2006. 被引量：1
10Ravichandran D,Pantel P, Hovy E. Randomized Algorithms and NLP..Using Locality Sensitive Hash Function for High Speed Noun Clustering[M]. Information Sciences Institute University of Southern California, 2004. 被引量：1

共引文献26

1李含光.主成分分析在化工生产操作优化条件中的应用[J].科技创新导报,2009,6(11):53-53. 被引量：4
2刘文娣,蔡明.有效的结构化P2P信息检索[J].计算机工程与设计,2009,30(16):3787-3789. 被引量：1
3杨恒,王庆,何周灿.面向高维图像特征匹配的多次随机子向量量化哈希算法[J].计算机辅助设计与图形学学报,2010,22(3):494-502. 被引量：9
4何周灿,王庆,杨恒.一种面向快速图像匹配的扩展LSH算法[J].四川大学学报（自然科学版）,2010,47(2):269-274. 被引量：8
5明廷锋,王豪,苏永生.基于BP网络和LS-SVM的特征提取和故障识别方法[J].昆明理工大学学报（理工版）,2010,35(5):41-46. 被引量：1
6易磊,仲红,袁先平,赵玉.支持容错检索的数据共享方案[J].计算机应用,2011,31(6):1525-1527.
7赵跃华,张翼,言洪萍.基于数据挖掘技术的加壳PE程序识别方法[J].计算机应用,2011,31(7):1901-1903. 被引量：1
8刘玉邦,梁川.川中丘陵区农业水资源高效利用综合分区[J].灌溉排水学报,2011,30(5):42-46. 被引量：3
9刘玉邦,梁川.基于主成分和模糊C-均值聚类算法的农业水资源高效利用综合分区[J].水文,2011,31(5):57-63. 被引量：14
10赵启潍,张乐,祝贝利,刘静.面向高维数据的LSH算法及应用[J].福建电脑,2012,28(4):13-14. 被引量：1

1bite.企业如何让CIO发挥最大作用[J].网络与信息,2012,26(8):13-13.
2黄海新,张路,邓丽.基于数据挖掘的恶意代码检测综述[J].计算机科学,2016,43(7):13-18. 被引量：8
3edk.价格传真[J].微型计算机,2008(4):114-117.
4肖克锋.域名炒家日记：一个“米农”的一天[J].现代计算机（中旬刊）,2008(10):120-120.
5苏芒.三天打鱼两天上网[J].现代计算机（中旬刊）,2009(6):120-120.
6IT@Intel领先实践收成效[J].信息方略,2013(10):61-61.
72007年我国PMP将飞速增长[J].现代电子技术,2006,29(4).
8齐鹏.追求稳健投资小面馆有好收成[J].数字商业时代,2009(23):99-99.
9沈可.大数据应用的认知和思考[J].科学,2017,69(1):38-40. 被引量：2
10春天是希望的开始[J].视窗世界,2005(3):1-1.

计算机工程与应用

2014年第18期

浏览历史

内容加载中请稍等...

面向海量病毒样本家族聚类方法的研究

参考文献9

二级参考文献26

共引文献26

相关作者

相关机构

相关主题

浏览历史