期刊文献+

一种即时软件缺陷预测模型及其可解释性研究 被引量:1

Just-in-time Software Defect Prediction Model and Its Interpretability Research
下载PDF
导出
摘要 即时软件缺陷预测是保障软件安全与质量相统一的必要途径,在软件工程领域受到越来越多的关注.然而,现有数据集存在特征冗余和特征相关性低的情况,极大影响了即时软件缺陷预测模型的分类性能和稳定性.此外,分析缺陷数据特征对模型的影响尤为重要,但如今对软件缺陷预测模型进行解释性研究较少.针对这些问题,文章基于6个开源项目的227417个代码级变更的大规模实证研究,创新性地选择了SHAP+SMOTEENN+XGBoost(SHAP-SEBoost)构建即时软件缺陷预测模型.首先通过SHAP(SHapley Additive exPlanation)模型可解释器分析初始数据集特征,并根据分析结果对数据集进行相应的特征选择与组合.然后,利用SMOTEENN对类不平衡的缺陷数据进行正负样本均衡化,使用集成学习算法XGBoost对实验数据进行预测建模.最后,使用SHAP对本文模型进行可解释性分析.实验结果表明SHAP-SEBoost有效地提高了分类性能,与基线模型以及近年优秀模型相比AUC平均提高11.6%,F1平均提升33.5%. Just-in-time software defect prediction is a necessary way to ensure software safety and quality,which has been paid more and more attention in the field of software engineering.However,existing data sets are characterized by redundancy and low feature correlation,which greatly affects the classification performance and stability of real-time software defect prediction models.In addition,analyzing the influence of defect data characteristics on the model is particularly important,but there are few explanatory studies on software defect prediction models nowadays.To address these problems,this paper innovatively selected SHAP+SMOTEENN+XGBoost(SHAP-SEBoost)to build a real-time software defect prediction model based on a large-scale empirical study of 227,417 code-level changes in six open source projects.First,the SHapley Additive exPlanation model can be used to analyze the characteristics of the initial data set,and then select and combine corresponding characteristics of the data set according to the analysis results.Then,the positive and negative sample equalization of the class unbalanced defect data was carried out using SMOTEENN,and the integrated learning algorithm XGBoost was used to model the prediction of the experimental data.Finally,SHAP is used to analyze the interpretability of the model in this paper.Experimental results showed that SHAP-SEBoost effectively improved classification performance,with an average increase of 11.6%in AUC and 33.5%in F1 compared with baseline and recent excellent models.
作者 陈丽琼 王璨 宋士龙 CHEN Li-qiong;WANG Can;SONG Shi-long(Department of Computer Science and Information Engineering,Shanghai Institute of Technology,Shanghai 201418,China)
出处 《小型微型计算机系统》 CSCD 北大核心 2022年第4期865-871,共7页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(61702334)资助。
关键词 即时软件缺陷预测 模型可解释性 特征工程 集成学习 just-in-time software defect prediction model interpretability feature engineering ensemble learning
  • 相关文献

参考文献5

二级参考文献151

  • 1王青,伍书剑,李明树.软件缺陷预测技术.软件学报,2008,19(7):1565—1580.http://www.jos.org.cn/1000—9825/19/1565.htm. 被引量:1
  • 2Hall T, Beecham S, Bowes D, Gray D, Counsell S. A systematic literature review on fault prediction performance in software engineering. IEEE Trans. on Software Engineering, 2012,38(6): 1276-1304. [doi: 10.1109/TSE.2011.103 ]. 被引量:1
  • 3Radjenovic D, Hericko M, Torkar R, Zivkovic A. Software fault prediction metrics: A systematic literature review. Information and Software Technology, 2013,55(8): 1397-1418. [doi: 10.1016/j.infsof.2013.02.009]. 被引量:1
  • 4Akiyama E. An example of software system debugging. In: Proc. of the Int'1 Federation of Information Proc. Societies Congress. New York: Springer Science and Business Media, 1971. 353-359. 被引量:1
  • 5Halstead MH. Elements of Software Science (Operating and Programming Systems Series). New York: Elsevier Science Inc., 1977. 被引量:1
  • 6McCabe TJ. A complexity measure. IEEE Trans. on Software Engineering, 1976,2(4):308-320. [doi: 10.1109/TSE.1976.233837]. 被引量:1
  • 7Chidamber SR, Kemerer CF. A metrics suite for object oriented design. IEEE Trans. on Software Engineering, 1994,20(6): 476-493. [doi: 10.1109/32.295895]. 被引量:1
  • 8Basili VR, Briand LC, Melo WL. A validation of object-oriented design metrics as quality indicators. IEEE Trans. on Software Engineering, 1996,22(10):751-761. [doi: 10.1109/32.544352]. 被引量:1
  • 9Subramanyam R, Krishnan MS. Empirical analysis of CK metrics for object-oriented design complexity: Implications for software defects. IEEE Trans. on Software Engineering, 2003,29(4):297-310. [doi: 10.1109/TS E.2003.1191795]. 被引量:1
  • 10Zhou YM, Xu BW, Leung H. On the ability of complexity metrics to predict fault-prone classes in object-oriented systems. Journal of Systems and Software, 2010,83(4):660-674. [doi: 10.1016/j.jss.2009.11.704]. 被引量:1

共引文献153

同被引文献16

引证文献1

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部