摘要
针对C程序提出一种生成标记字符串的方法,即用XML文本表示C程序.首先格式化源程序,从C语言全集中挑选出部分能代表程序结构的关键结构,并用正则表达式进行识别,然后将C程序中容易发生抄袭的结构信息存储到XML文本中,最后对实验系统进行了测试.测试结果表明,该方法能快速找到程序中发生抄袭的代码,从而提高相似度比较的速度和准确性.
A method of generating token string for C program was presented,that is said that using XML text represent C program.Firstly,formatting the source program;secondly,picking out the key structure that can represent the procedural structure from C language,the key structure was identified by corresponding regular expressions;thirdly,the structure information in C program,which is prone to plagiarism,is stored in XML text;finally,the experiment system is tested,and the result proved that the method quick searches the code which is plagiarized and the speed and accuracy be improved.
出处
《内蒙古师范大学学报(自然科学汉文版)》
CAS
2011年第3期320-324,共5页
Journal of Inner Mongolia Normal University(Natural Science Edition)
基金
国家自然科学基金资助项目(60940027)
内蒙古师范大学研究生科研创新基金(CXJJS10052)