摘要
根据汇编语言自身的特点,提出了结合属性计数和结构度量技术的相似性检测混合算法。在该方法中,将程序段的数目、子程序定义和调用的次数、循环指令loop出现的次数、转移指令出现的次数作为结构信息,73个使用频率较高的关键字作为属性信息。在从汇编语言程序中提取这些信息后,利用卡方检验来判断2个程序的相似性。实验结果表明,从混合算法得到的结果与人工检测的结果相一致,优于从属性计数和结构度量技术得到的结果。
Plagiarism often occurs in programming assignments submitted by students.Similarity detection techniques can help teachers find the suspicious plagiarism.Most similarity detection techniques use identical algorithm for different programming languages,which leads to the redundant checking algorithms and degrades the checking accuracy.In this paper,a hybrid algorithm of similarity detection adapting to the characteristic of assembly language was presented,which combined attribute counting with structure metrics technique.In the algorithm,the number of paragraphs,the number of definition and calling of subroutines,the number of loop and branch occurrences in assembly programs were extracted as the structure information.And 73 high frequent keywords were taken as the attribute information.The similarity of two programs was judged with the chi-square test after getting the attribute and structure information.Experiments demonstrated that results from the proposed algorithm were consistent to those from the manual check.The hybrid algorithm was superior to the methods based on attribute counting and structure metrics.
出处
《河北科技大学学报》
CAS
北大核心
2011年第2期138-142,共5页
Journal of Hebei University of Science and Technology
基金
天津市应用基础及前沿技术研究计划(10JCZDJC16000)
关键词
汇编语言
相似性检测
抄袭
属性计数
结构度量
assembly language
similarity detection
plagiarism
attribute counting
structure metrics