期刊文献+

代价敏感的客户流失预测半监督集成模型研究 被引量:10

Semi-supervised ensemble based on metacost model for customer churn prediction
原文传递
导出
摘要 客户流失预测是企业客户关系管理的重要内容.在现实的很多客户流失预测建模过程中,由于数据类别的高度不平衡现象的存在,使得模型的分类性能低下,不能很好地进行分类预测.同时由于现实情况中只有少量有类别标签的样本,更多的是无类别标签数据的存在,造成了大量有用信息的浪费.为了解决以上两个问题,本研究将元代价敏感学习,半监督学习和Bagging集成等技术结合,提出了代价敏感的客户流失预测半监督集成模型(semi-supervised ensemble based on metacost,SSEM).该模型主要包括三个阶段:1)用Metacost方法修改初始有标签训练集L的类别标签,得到新的训练集Lm,并将其随机的分为模型训练集Ltr和模型验证集Va;2)使用Va挑选分类精度最高的三个基分类器,并用其选择性标记无类别标签U中的样本,并将它们添加到Ltr中;3)用新的模型训练集Ltr训练N个基本分类模型,并对测试集样本进行分类,进一步将分类结果进行集成.在两个客户流失预测数据集上进行实证分析,将SSEM模型与常用的监督式集成模型以及半监督式集成模型相比,结果表明,SSEM具有更好的客户流失预测性能. Customer churn prediction is an important content of customer relationship management(CRM).In many real customer churn prediction modeling,the class distribution is highly imbalanced,so that the performance of model is poor and it’s difficult to achieve satisfactory results.At the same time,in reality,there are only a small number of labeled samples,and a large number of them are unlabeled,which cause a lot of waste of useful information.In order to solve the two problems above,this study combines the technologies of meta cost-sensitive learning,semi-supervised learning and ensemble method of Bagging,and proposes semi-supervised ensemble based on metacost model(SSEM) for customer churn prediction.This model mainly includes the following three stages:1) Metacost method is used to modify the label of initial labeled training set L,a new training set Lm is obtained,then Lm is randomly divided into model training set Ltr and model verification set Va;2) Va is used to select three base classifiers with the highest classification accuracy,then these classifiers cooperate to selectively label some samples from unlabeled data set U,which are added into Ltr;3) N base classifiers are trained on the new model training set Ltr,then using them to classify samples in test set,and the final classification results are obtained by integration.The empirical analysis is conducted in two customer churn prediction datasets,and the results show that the performance of SSEM model is superior to the common used supervised ensemble models and the semi-supervised ensemble models.
作者 肖进 李思涵 贺小舟 腾格尔 贾品荣 谢玲 XIAO Jin;LI Sihan;HE Xiaozhou;TENG Geer;JIA Pinrong;XIE Ling(Business School,Sichuan University,Chengdu 610064,China;Management Science and Operations Research Institute,Sichuan University,Chengdu 610064,China;Beijing Research Center Science of Science,Beijing 100089,China;School of Medical Information Engineering,Zunyi Medical University,Zunyi 563006,China)
出处 《系统工程理论与实践》 EI CSSCI CSCD 北大核心 2021年第1期188-199,共12页 Systems Engineering-Theory & Practice
基金 国家社会科学基金重大项目(18VZL006) 四川大学文科杰出青年基金(sksy1201709) 北京市科学技术研究院“北科学者”计划(PXM2020-178216-000008) 北京市财政课题(PXM2020-178216-000001)。
关键词 客户流失预测 类别分布不平衡 半监督 协同训练 代价敏感 customer churn prediction imbalanced class distribution semi-supervised co-training cost-sensitive
  • 相关文献

参考文献14

二级参考文献161

共引文献265

同被引文献114

引证文献10

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部