摘要
基于频谱的故障定位(Spectrum-Based Fault Localization,SBFL)技术已被广泛研究,可以帮助开发人员快速找到程序错误位置,以降低软件测试成本。然而,测试套件中存在一种特殊的测试用例,其执行了错误的语句但能输出符合预期的结果,这种测试用例被称为偶然正确(Coincidental Correct,CC)测试用例。CC测试用例会对SBFL技术的性能产生负面影响。为了减轻CC产生的负面影响,提升SBFL技术性能,文中提出了一种基于机器学习的CC测试用例识别方法(CC test cases Identification via Machine Learning,CCIML)。CCIML结合怀疑度公式特征和程序静态特征来识别CC测试用例,从而提高SBFL技术的故障定位精度。为了评估CCIML方法的性能,文中基于Defects4J数据集进行对比实验。实验结果表明,CCIML方法识别CC测试用例的平均召回率、准确率和F 1分数分别为63.89%,70.16%和50.64%,该结果优于对比方法。除此之外,采用清洗和重标策略处理CCIML方法识别出的CC测试用例后,最终取得的故障定位效果也优于对比方法。其中,在清洗策略和重标策略下,错误语句怀疑度值排在第一位的数量分别为328和312,相比模糊加权K近邻(Fuzzy Weighted K-Nearest Neighbor,FW-KNN)方法,定位到的故障数量分别增长了124.66%,235.48%。
Spectrum-based fault localization(SBFL)techniques have been widely studied to help developers quickly find the position of the fault,so as to reduce the cost of program debugging.However,there is a special test case in the test suites that executes the fault statement but outputs the expected result,and this test case is called coincidental correct(CC)test case.CC test case can negatively effect the performance of SBFL fault localization.In order to mitigate the negative impact of CC test case and enhance the performance of SBFL technique,this paper proposes CC test cases identification via machine learning approach(CCIML).CCIML approach utilizes features extracted from the SBFL suspiciousness formula and program static features to identify CC test cases,thus improving the fault localization accuracy of SBFL technique.To evaluate the performance of CCIML approach,experiments are carried out on the Defects4J dataset.The experimental results show that the average recall,precision,and F 1 score of the CCIML approach for identifying CC test cases are 63.89%,70.16%,and 50.64%,respectively,better than the baselines.In addition,after processing the CC test cases identified by the CCIML approach using the cleaning and relabeling strategies,the fault localization performance obtained is also better than the comparison baselines.Under the cleaning and relabe-ling strategy,the number of faulty statements ranked first in suspicion value are 328 and 312,respectively.Compared to the fuzzy weighted K-nearest neighbor(FW-KNN)approach,the fault localization accuracy is improved by 124.66%and 235.48%.
作者
田帅华
李征
吴永豪
刘勇
TIAN Shuaihua;LI Zheng;WU Yonghao;LIU Yong(College of Information Science and Technology,Beijing University of Chemical Technology,Beijing 100029,China)
出处
《计算机科学》
CSCD
北大核心
2024年第6期68-77,共10页
Computer Science
基金
国家自然科学基金(61902015,61872026)。
关键词
软件测试
故障定位
机器学习
偶然正确测试用例
特征提取
Software debugging
Fault localization
Machine learning
Coincidental correct test case
Feature extraction