期刊文献+

面向海量病毒样本家族聚类方法的研究

Research on familial clustering of massive malware samples
下载PDF
导出
摘要 计算机反病毒厂商每天接收成千上万的病毒样本,如何快速有效地将这些海量样本家族化是一个亟待解决的问题。提出了一种可伸缩性的聚类方法,面对输入海量的病毒样本向量化特征集,使用局部敏感哈希索引技术进行初次快速聚类,使用扩展K均值算法进行二次细致聚类。实验表明该聚类方法在有限牺牲准确度的情况下,大为提高了病毒聚类的时间效率。 Anti-malware companies receive thousands of malware samples every day, so it becomes more and more pressing to handle these samples timely and effectively. A scalable clustering approach is proposed to group these massive malware samples. LSH algorithm is used to cluster samples rapidly. Extended K-means algorithm is employed to perform accurately clustering. Experimental results show that this approach can improve malware clustering efficiency observably at the cost of little accuracy.
出处 《计算机工程与应用》 CSCD 2014年第18期118-121,共4页 Computer Engineering and Applications
关键词 病毒家族 可伸缩性聚类 局部敏感哈希 扩展K均值 malware family scalable clustering Locality Sensitive Hash(LSH)algorithm extended K-means
  • 相关文献

参考文献9

  • 1Haveliwala T H, Gionis A, Indyk EScalable techniques for clustering the web[C]//WebDB(Information Proceedings), 2000 : 129-134. 被引量:1
  • 2Bailey M,Oberheide J,Andersen J, et al.Automated classi- fication and analysis of internet malware[C]//Proceedings of the 10th International Symposium on Recent Anvances in Intrusion Detection, 2007: 178-197. 被引量:1
  • 3Bayer U,Comparetti P M,Hlauschek C,et al.Scalable, behavior-based malware clustering[C]//Proceedings of the Network and Distributed System Security Symposium, 2009. 被引量:1
  • 4Weber R, Schek H,Blott S.A quantitative analysis and performance study for similarity search methods in high dimensional spaces[C]//Proceedings of the 24th Intl Conf on Very Large Data Bases(VLDB), 1998: 194-205. 被引量:1
  • 5Indyk P, Motwani R.Approximate nearest neighbors: towards removing the curse of dimensionality[C]//Jeffrey V.Proc of the 30th Annual ACM Syrup on Theory of Computing. New York: ACM Press, 1988 : 604-613. 被引量:1
  • 6蔡衡,李舟军,孙健,李洋.基于LSH的中文文本快速检索[J].计算机科学,2009,36(8):201-204. 被引量:13
  • 7卢炎生,饶祺.一种LSH索引的自动参数调整方法[J].华中科技大学学报(自然科学版),2006,34(11):38-40. 被引量:6
  • 8TanPangning,SteinbachM,KumarV数据挖掘导论[M].范明,范宏建,译.北京:人民邮电出版社,2006. 被引量:2
  • 9朱明旱,罗大庸,易励群.一种广义的主成分分析特征提取方法[J].计算机工程与应用,2008,44(26):38-40. 被引量:11

二级参考文献26

  • 1王立威,王潇,常明,封举富.关于二维主成分分析方法的研究[J].自动化学报,2005,31(5):782-787. 被引量:7
  • 2陈伏兵,陈秀宏,王文胜,杨静宇.人脸识别中PCA方法的推广[J].计算机工程与应用,2005,41(34):34-38. 被引量:9
  • 3高全学,潘泉,梁彦,张洪才,程咏梅.基于描述特征的人脸识别研究[J].自动化学报,2006,32(3):386-392. 被引量:13
  • 4Stein B. Principles of hash - based text retrieval [C]//Annual ACM Conference on Research and Development in Information Retrieval Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2007. 被引量:1
  • 5Athitsos V,Potamias M,Papapetrou P,et al. Nearest Neighbor Retrieval Using Distanee-Based Hashing[C] // Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on. 2008. 被引量:1
  • 6IndykP, DatarM, ImmorlicaN. Locality-SensitiveHashingScheme Based on p-Stable[C]//Annual Symposium on Computational Geometry. 2004. 被引量:1
  • 7Arya S, Mount D. Ann: Library for approximate nearest neighbor search[OL], http: //www. cs. umd. edu/-mount/ANN/. 被引量:1
  • 8Indyk P, Motwani R. Approximate nearest neighbors : Towards removing the curse of dimensionality[C]//Jeffrey V, ed. Proc. of the 30th Annual ACM Symp. on Theory of Computing. New York: ACM Press, 1998 : 604-613. 被引量:1
  • 9Panigrahy R. Entropy based nearest neighbor searchin high dimensions[C]//Proc, of ACM-SIAMSymposium on Discrete Algorithms(SODA). 2006. 被引量:1
  • 10Ravichandran D,Pantel P, Hovy E. Randomized Algorithms and NLP..Using Locality Sensitive Hash Function for High Speed Noun Clustering[M]. Information Sciences Institute University of Southern California, 2004. 被引量:1

共引文献26

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部