基于机器学习识别偶然正确测试用例

Identifying Coincidental Correct Test Cases Based on Machine Learning

下载PDF

导出

摘要基于频谱的故障定位(Spectrum-Based Fault Localization,SBFL)技术已被广泛研究,可以帮助开发人员快速找到程序错误位置,以降低软件测试成本。然而,测试套件中存在一种特殊的测试用例,其执行了错误的语句但能输出符合预期的结果,这种测试用例被称为偶然正确(Coincidental Correct,CC)测试用例。CC测试用例会对SBFL技术的性能产生负面影响。为了减轻CC产生的负面影响,提升SBFL技术性能,文中提出了一种基于机器学习的CC测试用例识别方法(CC test cases Identification via Machine Learning,CCIML)。CCIML结合怀疑度公式特征和程序静态特征来识别CC测试用例,从而提高SBFL技术的故障定位精度。为了评估CCIML方法的性能,文中基于Defects4J数据集进行对比实验。实验结果表明,CCIML方法识别CC测试用例的平均召回率、准确率和F 1分数分别为63.89%,70.16%和50.64%,该结果优于对比方法。除此之外,采用清洗和重标策略处理CCIML方法识别出的CC测试用例后,最终取得的故障定位效果也优于对比方法。其中,在清洗策略和重标策略下,错误语句怀疑度值排在第一位的数量分别为328和312,相比模糊加权K近邻(Fuzzy Weighted K-Nearest Neighbor,FW-KNN)方法,定位到的故障数量分别增长了124.66%,235.48%。 Spectrum-based fault localization(SBFL)techniques have been widely studied to help developers quickly find the position of the fault,so as to reduce the cost of program debugging.However,there is a special test case in the test suites that executes the fault statement but outputs the expected result,and this test case is called coincidental correct(CC)test case.CC test case can negatively effect the performance of SBFL fault localization.In order to mitigate the negative impact of CC test case and enhance the performance of SBFL technique,this paper proposes CC test cases identification via machine learning approach(CCIML).CCIML approach utilizes features extracted from the SBFL suspiciousness formula and program static features to identify CC test cases,thus improving the fault localization accuracy of SBFL technique.To evaluate the performance of CCIML approach,experiments are carried out on the Defects4J dataset.The experimental results show that the average recall,precision,and F 1 score of the CCIML approach for identifying CC test cases are 63.89%,70.16%,and 50.64%,respectively,better than the baselines.In addition,after processing the CC test cases identified by the CCIML approach using the cleaning and relabeling strategies,the fault localization performance obtained is also better than the comparison baselines.Under the cleaning and relabe-ling strategy,the number of faulty statements ranked first in suspicion value are 328 and 312,respectively.Compared to the fuzzy weighted K-nearest neighbor(FW-KNN)approach,the fault localization accuracy is improved by 124.66%and 235.48%.

作者田帅华李征吴永豪刘勇 TIAN Shuaihua;LI Zheng;WU Yonghao;LIU Yong(College of Information Science and Technology,Beijing University of Chemical Technology,Beijing 100029,China)

机构地区北京化工大学信息科学与技术学院

出处《计算机科学》 CSCD 北大核心 2024年第6期68-77,共10页 Computer Science

基金国家自然科学基金(61902015,61872026)。

关键词软件测试故障定位机器学习偶然正确测试用例特征提取 Software debugging Fault localization Machine learning Coincidental correct test case Feature extraction

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

1吴广奇,金红兵,叶子豪,郭远超.110 kV及以上输电线路单相接地故障点自动定位方法[J].通信电源技术,2024,41(5):38-40.
2宗嵩,曾维才,陈志勇,赵多元.基于K-means算法和积灰损耗系数的西北地区光伏电站清洗策略建模分析[J].水电站机电技术,2024,47(6):35-39.
3牛生越.煤化工废水零排放过程中的膜污染与清洗策略研究[J].清洗世界,2024,40(5):63-65.
4郑建松.输电线路运行故障无人机定位及准确性测试[J].电力设备管理,2024(7):54-56.
5杜进.C语言程序动态更新中的逻辑正确性[J].中国科技期刊数据库科研,2016(11):203-203.
6张奇,彭超,薛冬峰.数据驱动储能电池新材料的筛选和设计[J].中国科学：技术科学,2024,54(4):584-600. 被引量：1
7吕海斌,马文峰.油田生产中电力系统的安全运行措施研究[J].中文科技期刊数据库（全文版）工程技术,2017(1):204-204.
8王浩仁,崔展齐,岳雷,陈翔,郑丽伟.基于冗余覆盖信息约简的软件缺陷定位方法[J].电子学报,2024,52(1):324-337.
9张雪楠,张雪寒,曹萌萌,陈楚.基于漏电流的配电网电缆故障感知与老化定位[J].电工技术,2024(9):81-83.
10Yi SONG,Xiaoyuan XIE,Baowen XU.When debugging encounters artificial intelligence:state of the art and open challenges[J].Science China(Information Sciences),2024,67(4):38-80.

计算机科学

2024年第6期

浏览历史

内容加载中请稍等...

基于机器学习识别偶然正确测试用例

相关作者

相关机构

相关主题

浏览历史