摘要
针对源代码迁移模型存在的迁移代码语义一致性问题,在词符注意力机制的基础上引入了语句注意力机制,提出了一种基于层次注意力机制的源代码迁移模型HPGN(hierarchical pointer-generator network),设计了状态传递机制。HPGN在迁移过程中,语句注意力机制对齐源代码语句和迁移代码语句的特征,词符注意力机制从对齐的代码语句中提取词符,状态传递机制传递相邻迁移代码语句的特征,从而提升了迁移代码的语义一致性。在真实项目数据集的实验结果表明,HPGN比最佳对比模型提高了3.4个总体分值,同时有着更少的模型参数量。此外,消融实验验证了状态传递机制和HPGN层次架构的有效性。
To address the semantic consistency problem of migrated code in the source code migration model,this paper introduced the statement-level attention mechanism based on the token-level attention mechanism,proposed a source code migration model HPGN based on the hierarchical attention mechanism,and designed state feeding mechanisms.During migration,the statement-level attention mechanism aligned the features of source code statements and migrated code statements,the token-level attention mechanism extracted tokens from the aligned code statements,and the state feeding mechanism passed the feature of adjacent migrated code statement,thus improving the semantic consistency of migrated code.Experimenting on a real project dataset,the results show that HPGN improves the overall score by 3.4 over the best comparison model while having fewer model parameters.In addition,ablation experiments validate the effectiveness of the state feeding mechanisms and HPGN hierarchical architecture.
作者
李征
徐明瑞
吴永豪
刘勇
陈翔
武淑美
刘恒源
Li Zheng;Xu Mingrui;Wu Yonghao;Liu Yong;Chen Xiang;Wu Shumei;Liu Hengyuan(College of Information Science&Technology,Beijing University of Chemical Technology,Beijing 100029,China;School of Information Science&Technology,Nantong University,Nantong Jiangsu 226019,China)
出处
《计算机应用研究》
CSCD
北大核心
2023年第10期3082-3090,共9页
Application Research of Computers
基金
国家自然科学基金资助项目(61872026,61902015)。
关键词
代码迁移
代码语句
机器翻译
注意力机制
code migration
code statement
machine translation
attention mechanism