摘要
恶意程序代码的相似度估计是恶意程序代码分析和检测的重要研究内容。现有的方法主要是对恶意程序代码进行属性计算或结构度量,但由于恶意程序代码结构的灵活性和恶意程序代码的伪装、恶意程序代码的相似度较难度量。提出了改进指纹和LSC加权的恶意程序代码相似度估计算法。该算法首先对恶意程序代码进行函数作用域划分和标准化预处理,然后对其进行字串序列化,利用改进的指纹相似度来对恶意程序代码的相似度进行度量;同时结合最大公共字串匹配算法进行结构度量,并对其相似度计算结果进行加权,对恶意程序代码结构的相似度进行综合估计。实验以C语言结构的程序代码为例,利用折半查找算法生成恶意程序代码测试数据集进行算法有效性验证。仿真证明该算法具有较好的恶意程序代码相似度估算精度。
The malicious program code similarity estimation is malicious code analysis and detection is an important research content. The existing method is mainly to the malicious code for attribute computing or structure measure, but as a result of malicious code structure flexibility and malicious code disguise, malicious program code similarity is difficulty quantity. Improvement fingerprints and LSC weighted malicious program code similarity esti- mation algorithm are put forward, in this algorithm first for malicious program code function scope division and standardization preprocessing, and then carry on the string serialization, using the improved fingerprint similarity to malicious program code similarity measure for, in combination with the biggest public word string matching algo- rithm structure measure, and the similarity calculation results are weighted, the malicious program code similarity structure comprehensive estimation. Experiments to C language structure of the program code, for example, the use of binary search algorithm generating malicious code test data sets algorithm validation, the simulation results show that the algorithm has good malicious program code similarity estimation accuracy.
出处
《科学技术与工程》
北大核心
2013年第10期2871-2874,2879,共5页
Science Technology and Engineering
基金
国家自然科学基金项目(61142010)资助