摘要
提出一种程序源代码相似度度量方法,根据C语言程序源代码的结构特点划分函数作用域,采用相关规则对划分后的程序代码进行规格化处理,对生成的Token序列求Hash值,使用散列值匹配算法对程序源代码进行相似度度量。实验结果证明,该方法可提高程序源代码相似度度量精度,且运行效率较高。
This paper proposes a method of program source code similarity measurement. According to the structure feature of the C program language source code, by using the division of function scope, the rules normalize source code. The generated Token sequence is calculated Hash value. It uses the Hash value matching algorithm to measure the program source code similarity. Experimental results show that the accuracy of similar degree can be measured well and run-time efficiency is high.
出处
《计算机工程》
CAS
CSCD
2012年第6期37-39,共3页
Computer Engineering
基金
中央高校基本科研业务费科研专项基金资助项目(CDJZR10180008)
关键词
函数作用域
代码规格化
散列值匹配
相似度度量
function scope
code normalization
Hash value matching
similarity measurement