摘要
整合创新数据预处理技术与集成算法利用不平衡数据探讨了公司破产预测问题。首先,运用冗余信息处理方法、不同抽样方法等对不平衡数据进行预处理。其次,以5.0分类器(Classifier 5.0,C5.0)决策树和单隐层前馈神经网络作为基分类器,分别与三类重抽样数据预处理技术结合,择出最优抽样法。再次,结合自助汇聚法提升分类效果,并运用十折交叉验证的受试者操作特征曲线的下方面积进行评价,对比了两基分类器的集成模型。最后,运用加利福尼亚大学尔湾分校数据库中一万多家波兰制造业公司的实际数据进行实验验证。实验结果表明:欠抽样或人工少数类过采样法与神经网络结合的集成模型分类效果最优,为企业实施破产预测提供积极支撑。
This paper discusses the problem of corporate bankruptcy prediction using unbalanced data by innovatively integrating data preprocessing technology and integration algorithm.Firstly,redundant information processing and different sampling methods are used to preprocess unbalanced data.Secondly,a decision tree with Classifier 5.0(C5.0)and a single hidden layer feedforward neural network are used as the base classifier to select the optimal sampling method by combining with three kinds of resampling data preprocessing technologies.Thirdly,the self-aggregation method is combined to improve the classification performance,and the integration models of the two base classifiers are compared by the area under the receiver operating characteristic curve with 10-fold cross-validation.Finally,the actual data of more than 10000 Polish manufacturing companies in the database of University of California Irvine are used for experimental verification.The experimental results show that the integrated model combining under-sampling or synthetic minority over-sampling method with neural network archive the best classification performance,which provides positive support for the enterprises to implement bankruptcy prediction.
作者
周文泳
冯丽霞
段春艳
ZHOU Wenyong;FENG Lixia;DUAN Chunyan(School of Economics and Management,Tongji University,Shanghai 200092,China;School of Mechanical Engineering,Tongji University,Shanghai 201804,China)
出处
《同济大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2022年第2期283-290,共8页
Journal of Tongji University:Natural Science
基金
2020年度同济大学“双带头人”教师党支部书记学术能力提升计划项目
上海市浦江人才计划(20PJ1413700)。
关键词
二元分类
不平衡数据
神经网络
C5.0决策树
集成方法
binary classification
unbalanced data
neural network
C 5.0 decision tree
integrated methods