摘要
ROC曲线是模型选择的一种重要方法,但ROC曲线的不确定性影响了模型选择的准确性.基于分辨粒度,从反映得分的不确定性的角度提出gROC和gAUC的概念,从理论上讨论了gROC的若干性质.在给出其算法之后,利用双正态模型检验了gROC的合理性.在此基础上,提出了两个模型选择度量——AUC和AUC,并在UCI数据集上验证了该模型选择度量的高效性.实验结果表明,gROC能够有效反映ROC曲线的不确定性,基于AUC和AUC的模型选择方法优于基于AUC或sAUC的模型选择方法,在某些情况下,gROC具有更强的对分类器性能的比较能力.
ROC Curve is an important method of model selection, but its uncertainty affects the accuracy of model selection. Based on discernible granularity and the view of reflecting the score's uncertainty, the study proposes the concept of gROC and gAUC, and discusses, theoretically, some properties of the gROC. The study also tests the reasonableness of gROC using binormal model after gave its algorithm. On this basis, the paper also proposes two model selection measures, λAUC and pAUC. The effieciency of these measures is verified based on UCI data sets. Experimental results show that the gROC can effectively reflect the uncertainty of ROC curve, and the model selection methods based on λAUC and pAUC are better than the method based on AUC or sAUC. In some cases, gROC has stronger capability on comparison of classifiers performance.
出处
《软件学报》
EI
CSCD
北大核心
2013年第1期109-120,共12页
Journal of Software
基金
国家自然科学基金(60863010
61163044)
国家重点基础研究发展计划(973)(2010CB334709)
吉林省科技发展计划(20090704)
关键词
机器学习
模型选择
分类
ROC曲线
粒度
machine learning
model selection
classification
ROC curve
granularity