期刊文献+

基于信息检索的源代码自动命名

Automatic Naming of Source Code Based on Information Retrieval
下载PDF
导出
摘要 源代码自动命名是指为给定代码的方法体命名一个反映代码功能的有意义的名称,可以使代码易读易懂,提高软件开发效率。传统自动命名方法仅使用代码的词法或者语法等单一信息,基于深度学习的自动命名方法通常忽略了语料库中的相似代码,影响命名准确率。针对上述问题,提出一种基于信息检索的源代码自动命名方法。首先,利用预训练模型和BERT-whitening方法提取输入代码和语料库中代码的有效特征,使用欧氏距离计算两者之间的语义相似度。其次,在语料库代码中选择与输入代码语义相似度较高的代码组成候选库,利用Jaccard系数和最长公共子序列分别计算输入代码与候选库代码的词法和语法相似度。最后,使用加权和来匹配候选库中与输入代码最相似的代码片段,复用该代码片段的方法名称作为输入代码的方法名称。实验结果表明,在公开的Java-small数据集上,与基于向量空间模型(VSM)和基于深度学习模型Code2Vec的自动命名方法相比,该方法的F1值分别提升了6.93和1.22个百分点,具有较优的预测性能。 Automatic naming of source code entails predicting a descriptive name that reflects the code function within a given method body.This practice can improve code readability and comprehension,thus enhancing the software development efficiency.Traditional naming approaches only use single information,such as lexical or syntactic information of the code,whereas deep learning-based naming approaches usually ignore similar examples in the corpus;both these approaches affect the code naming accuracy.To address these problems,this paper proposes an approach for automatic naming of source codes based on information retrieval.The proposed approach utilizes a pre-trained model and Bidirectional Encoder Representations from Transformers(BERT)-whitening method,which is an overall method for extracting the effective features of the input code and the code in the corpus,and calculates the semantic similarity between them on the basis of the Euclidean distance.Subsequently,the code with the highest semantic similarity ranking to the input code is selected as a candidate library among the corpus codes.The lexical and syntactic similarity between the input code and candidate library codes is calculated using the Jaccard index and the Longest Common Subsequence(LCS)method.Finally,lexical and syntactic similarities are fused to match the code fragment in the candidate library with the highest similarity to the input code.The method name of the code snippet is then reused as the method name of the input code.Experimental results show that the F1 value of the proposed approach on the public Java-small dataset increases by 6.93 and 1.22 percentage points compared to that for the Vector Space Model(VSM)and Code2Vec model,respectively,indicating excellent predictive performance.
作者 李雪 王雅文 张前进 LI Xue;WANG Yawen;ZHANG Qianjin(State Key Laboratory of Networking and Switching Technology,Beijing University of Posts and Telecommunications,Beijing 100876,China)
出处 《计算机工程》 CAS CSCD 北大核心 2024年第6期304-310,共7页 Computer Engineering
基金 国家自然科学基金(U1736110)。
关键词 自动命名 信息检索 深度学习 BERT-whitening方法 语义相似度 automatic naming information retrieval deep learning BERT-whitening method semantic similarity
  • 相关文献

参考文献3

二级参考文献9

共引文献56

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部