摘要
在金融行业中,信贷欺诈检测是一项重要的工作,能够为银行和消金机构减少大量的经济损失。然而,信贷数据中存在类别不平衡和正负样本特征重叠的问题,导致少数类识别灵敏度低且不同类别数据区分度低。针对这些问题,提出一种面向信贷欺诈检测的CTGANBoost方法。首先,在AdaBoost(Adaptive Boosting)方法的每一轮Boosting迭代中,引入基于类别标签信息约束的CTGAN(Conditional Tabular Generative Adversarial Network)方法学习特征分布,进行少数类数据增强工作;其次,基于CTGAN合成的增强数据集,设计了权重归一化方法,确保在样本加权过程中保持原始数据集的分布特征和相对权重。在3个开源数据集上的实验结果表明,CTGANBoost方法的表现均优于其他主流的信贷欺诈检测方法,AUC值提升了0.5%~2.0%,F1值提升了0.6%~1.8%,验证了CTGANBoost方法的有效性和泛化能力。
In the financial industry,credit fraud detection is an important task,which can reduce a lot of economic losses for banks and consumer institutions.However,there are problems of class imbalance and overlapping features of positive and negative samples in credit data,which lead to low sensitivity of minority class recognition and low data discrimination.To address these pro-blems,a CTGANBoost method is proposed for credit fraud detection.First,in each Boosting iteration of AdaBoost,the conditional tabular generative adversarial network(CTGAN)method based on class label information constraint is introduced to learn feature distribution for minority class data augmentation.Secondly,based on the enhanced data set synthesized by CTGAN,a weight normalization method is designed to ensure that the distribution characteristics and relative weights of the original data set are maintained during the sample weighting process.Experimental results on three open source datasets show that CTGANBoost outperforms other mainstream credit fraud detection methods,with AUC values increase by 0.5%~2.0%and F1 values increase by 0.6%~1.8%,which verifies the effectiveness and generalization ability of CTGANBoost method.
作者
卓佩妍
张瑶娜
刘炜
刘自金
宋友
ZHUO Peiyan;ZHANG Yaona;LIU Wei;LIU Zijin;SONG You(School of Software,Beihang University,Beijing 100191,China)
出处
《计算机科学》
CSCD
北大核心
2024年第S01期607-613,共7页
Computer Science
基金
河北省重点研发计划(21310101D)。