摘要
为了更有效地获得缺陷报告的非结构化信息的特征,提出一种D_BBAS(Doc2vec and BERT BiLSTM-attention similarity)方法,它基于大规模缺陷报告库训练特征提取模型,生成能反映深层次语义信息的缺陷摘要文本表示集和缺陷描述文本表示集;利用这两个分布式的表示集计算出缺陷报告对的相似度,从而得到两个新的相似度特征;这两个新特征将与基于结构化信息生成的传统特征结合后参与重复缺陷报告的检测。在著名开源项目Eclipse、NetBeans和Open Office的缺陷报告库上验证了D_BBAS方法的有效性,其中包含超过50万个缺陷报告。实验结果表明,相比于代表性方法,该方法的F1值平均提升了1.7%,证明了D_BBAS方法的有效性。
In order to obtain the features of unstructured information of bug reports more effectively,this paper proposed a D_BBAS(Doc2vec and BERT BiLSTM-attention similarity)method,which trained a feature extraction model based on a large-scale bug report library to generate a bug summary text representation set and a bug description text representation set that could reflect deep semantic information.Then,it used these two distributed representation sets to compute the similarity of bug report pairs,thus obtaining the two new similarity features.It combined two new features with the traditional features generated by structured information to participate in the detection of duplicate bug reports.This paper verified the effectiveness of the D_BBAS method on the bug report repositories of well-known open-source projects Eclipse,NetBeans and Open Office,which contained more than 500000 bug reports.The experimental results show that compared with the representative methods,the D_BBAS method improves the F_(1) value by 1.7%on average,which proves the effectiveness of the method.
作者
曾方
谢琪
崔梦天
Zeng Fang;Xie Qi;Cui Mengtian(Southwest Minzu University,Chengdu 610041,China)
出处
《计算机应用研究》
CSCD
北大核心
2022年第12期3736-3742,共7页
Application Research of Computers
基金
科技部高端外国专家引进计划资助项目(G2021186002L)
四川省科技计划资助项目(2022JDGD0011)
西南民族大学中央高校基本科研业务费专项资金资助项目(2021NYYXS44)。