期刊文献+

一种半监督集成学习软件缺陷预测方法 被引量:7

Semi-supervised Ensemble Learning Approach for Software Defect Prediction
下载PDF
导出
摘要 针对软件缺陷预测中标记样本难以获取以及分类不平衡的问题,提出一种基于半监督集成学习方法的软件缺陷预测模型(Tri_Adaboost).一方面利用欠采样方法以及半监督学习对标记样本进行扩充,随机选取一部分无标记样本进行预标注,缓解标记样本不足的问题;另一方面,利用SMOTE方法对扩充后的标记样本进行采样,然后使用AdaBoost集成方法对标记样本集进行预测.本文在NASA MDP数据集及基于开源项目下生成的空指针引用缺陷数据集上,验证模型的有效性,较于四种基本的机器学习分类方法,Tri_Adaboost算法在F-measure和AUC上均能取得较高的值. Aiming at the problem that the large number of labeled samples in the software defect prediction are difficult to obtain and the existence of class imbalanced in the software system, a semi-supervised ensemble learning method is proposed. On the one hand, under-sampling method and semi*supervised learning are used to extend the labeled samples, some unlabeled samples are randomly selected for pre-labeled to alleviate the insufficient of labeled samples; On the other hand, the SMOTE method is used to sample the extended labeled samples, and then the AdaBoost ensemble method is used to predict the labeled sample set. The paper verifies the validity of the model based on the NASA MDP data set and the null pointer defect dataset generated under the open source project, compared with the four basic machine learning classification methods, Tri_Adaboost algorithm can achieve higher values on F-measure and AUC.
作者 张肖 王黎明 ZHANG Xiao;WANG Li-ming(School of Information Engineering,Zhengzhou University,Zhengzhou 450001,China)
出处 《小型微型计算机系统》 CSCD 北大核心 2018年第10期2138-2145,共8页 Journal of Chinese Computer Systems
关键词 软件缺陷预测 分类不平衡 半监督学习 ADABOOST software defect prediction class imbalance semi-supervised learning AdaBoost
  • 相关文献

参考文献5

二级参考文献151

  • 1闫明松,周志华.代价敏感分类算法的实验比较[J].模式识别与人工智能,2005,18(5):628-635. 被引量:14
  • 2Nikora A, Munson J. Developing Fault Predictors for Evolving Soft- ware Systems//Proc of the 9th International Software Metrics Sym- posium. Sydney, Australia, 2003:338-350. 被引量:1
  • 3Nagappan N, Ball T. Static Analysis Tools as Early Indicators of Prerelease Defect Density// Proc of the 27th International Confer- ence on Software Engineering. St. Louis, USA, 2005:580-586. 被引量:1
  • 4Menzies T, Greenwald J, Frank A. Data Mining Static Code Attrib- utes to Learn Defect Predictors. IEEE Trans on Software Engineer- ing, 2007, 33(1): 2-13. 被引量:1
  • 5Lessmann S, Baesens B, Mues C, et al. Benchmarking Classifica- tion Models for Software Defect Prediction: A Proposed Framework and Novel Findings. IEEE Trans on Software Engineering, 2008, 34 (4) : 485-496. 被引量:1
  • 6Khoshgoftaar T M, Seliya N. Analogy-Based Practical Classification Rules for Software Quality Estimation. Empirical Software Engineer-ing, 2003, 8(4) : 325-350. 被引量:1
  • 7Emam K E, Benlarbi S, Goel N, et al. Comparing Case-Based Rea- soning Classifiers for Predicting High Risk Software Components. Journal of Systems and Software, 2001,55 (3) : 301-320. 被引量:1
  • 8Turhan B, Bener A. Analysis of Naive Bayes' Assumptions on Soft- ware Fault Data: An Empirical Study. Data and Knowledge Engi- neering, 2009, 68(2) : 278-290. 被引量:1
  • 9Khoshgoftaar T M, Allen E B, Hudepohl J P, et al. Application of Neural Networks to Software Quality Modeling of a Very Large Tele- communications System. IEEE Trans on Neural Networks, 1997, $ (4) : 902-909. 被引量:1
  • 10Zheng Jun. Cost-Sensitive Boosting Neural Networks for Software Defect Prediction. Expert Systems with Applications, 2010, 37 (6) : 4537-4543. 被引量:1

共引文献163

同被引文献53

引证文献7

二级引证文献24

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部