摘要
软件缺陷预测数据集作为预测模型构建及实施缺陷预测的基础设施,面临着两方面问题,一方面因数据源头上采集困难导致可用评测数据集较少;另一方面,已公开的数据集因领域数据不同导致了差异性大、度量标准不适用等问题,鲜有工程应用。结合国内航天领域的真实软件评测数据,对航天器软件度量指标设计方法与航天器软件缺陷预测数据集的构建过程进行了系统阐述。依据航天器软件的特点,提出了软件的代码度量与质量度量相结合的混合度量方法,确保能够从不同的角度全面刻画、度量航天器软件的相关特性;同时针对面向大规模数据收集、处理、分析等环节耗费高昂人力与存储成本的问题,提出了版本划分下的数据清洗与模块层级预处理相结合的标准化数据集构建方法。通过对基于该方法构建的SPACE数据集进行应用示范,验证了此方法能够有效应用于构建具有领域针对性的高质量软件缺陷预测数据集,并可取得模型AutoWeka良好的预测效果。
As being the infrastructure of prediction model’s construction and implementation,software defect prediction dataset faces two sets of problems.On the one hand,due to the difficulty of data collection from data sources,there are fewer available datasets.On the other hand,due to the difference of data in diverse fields and the inapplicability of software metrics standards,the published datasets are rarely applied in engineering.In this paper,combined with the real software testing data in the domestic space field,the method of spacecraft software metrics design and the construction process of spacecraft software defect prediction dataset are systematically expounded.According to the characteristics of the spacecraft software,a hybrid method combining the metrics based on code and quality of the software is proposed to ensure that the relevant characteristics of the spacecraft software can be described and measured comprehensively from different angles.At the same time,to solve the problem of high labor and storage cost for large-scale data collection,processing and analysis,a standardized dataset construction method combining the data cleaning process under version division and module hierarchical preprocessing is proposed.The dataset SPACE constructed based on this method is demonstrated,which proves that the method can be effectively applied to the construction of domain-specific high-quality software defect prediction dataset,and at the same time,good prediction effect of model AutoWeka can be obtained.
作者
郑小萌
高猛
滕俊元
ZHENG Xiao-meng;GAO Meng;TENG Jun-yuan(Beijing Sunwise Information Technology Ltd.,Beijing 100190,China;Beijing Institute of Control Engineering,Beijing 100190,China)
出处
《计算机科学》
CSCD
北大核心
2021年第S01期575-580,共6页
Computer Science
基金
装备预研领域基金项目(61400020407)。