摘要
提出一种基于核字典学习的软件缺陷预测方法,首先根据软件缺陷历史数据结构复杂、类不平衡的特点,利用核方法将软件缺陷历史数据映射到一个能代表原始数据分布的高维特征空间.然后在核空间中,通过学习得到一个核字典,利用这个核字典判定软件模块的属性,对软件模块进行缺陷预测.在核字典学习过程中,为了解决缺陷预测中的类不平衡问题,采用了一种核字典基选择策略,构造出一个类别平衡的核字典.在NASA数据集上的对比实验表明,核字典学习方法取得了较高的Fmeasure值和AUC值,有效地解决了缺陷预测中的类不平衡问题,取得了较好的预测效果.
Propose a kernel dictionary learning approach for software defect classification and prediction. The historical defect data used in software detect prediction has a complicated structure and a marked characteristic of class-imbalance which will leads to negative influence on decision of classifiers. Kernel trick can map the historical defect data to a higher-dimensional feature space where the defect data can be well represented. By using the characteristics of the metrics mined from the open source software, we get a kernel dictionary learning classifier, which can predict software defect efficiently. Considering the class-imbalance problem in software defect prediction, we build a class-balance kernel dictionary with the same number of defective modules and non-defective modules. We employ the widely used NASA datasets as test data to evaluate the performance of all compared methods, and experimental results show that kernel dictionary learning outperforms several representative state-of-the-art defect prediction methods.
出处
《小型微型计算机系统》
CSCD
北大核心
2017年第7期1501-1505,共5页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61272273)资助
关键词
缺陷预测
核字典学习
类不平衡问题
software detect prediction
kernel dictionary learning
class-imbalance problem