摘要
软件缺陷预测对保证软件质量至关重要,不平衡数据严重影响软件缺陷预测模型的性能,采用集成算法与不平衡学习方法相结合的方式可以有效解决该问题。基于6种集成学习算法与16种不平衡学习方法,提出一些新的集成预测方法,在14个高不平衡数据集上进行M×N路交叉验证实验,并选取3种常用机器学习算法作为对比。使用AUC、G-mean、recall和F1指标评价预测性能。实验结果表明:所提预测方法指标平均值比3种传统机器学习算法高出1.5%。
Software defect prediction is critical to ensuring software quality,and imbalanced data has a serious impact on the performance of software defect prediction models,which can be effectively reduced by combining integrated algorithms and imbalanced learning methods.In this paper,some new prediction methods based on six integrated learning algorithms and sixteen imbalanced learning methods were proposed,which subjected to M×N cross-validation experiments on fourteen highly imbalanced datasets,and three commonly-used machine learning algorithms were selected as comparison experiments.Making use of AUC,G-mean,recall and F1 indicators evaluate the prediction performance shows that,the average value of the proposed prediction methods is 1.5%higher than that of the three conventional machine learning algorithms.
作者
魏比贤
刘晓燕
WEI Bi-xian;LIU Xiao-yan(Faculty of Information Engineering and Automation,Kunming University of Science and Technology)
出处
《化工自动化及仪表》
CAS
2023年第4期549-556,共8页
Control and Instruments in Chemical Industry
关键词
软件缺陷预测
集成算法
机器学习
不平衡数据
software defect prediction
integrated algorithm
machine learning
imbalanced data