期刊文献+

基于信息熵的改进型支持向量机客户流失预测模型应用研究 被引量:5

An Applied Research on Improved Entropy-based SVM Churn Prediction Model
下载PDF
导出
摘要 客户流失数据是一类的非平衡数据集,如何有效地对其进行分类学习,其关键是要提高少数类(流失客户)的识别率,少数类是相对多数类而言的一类特殊的子样本,其错分的代价非常高,因此,有效地减少少数类的错分率是一个亟待解决的问题。本文在Veropoulous提出的采用不同惩罚因子数的支持向量机算法基础上,利用样本自身信息熵值来确定不同的惩罚因子,使模型更加倾向于提高少数类的识别精度,并在电信客户流失数据这一非平衡数据集中进行了验证,结果表明该方法较其他方法对流失客户(少数类)的识别率有很大的提高,具有很强的实际应用意义。 Customer churning data is a kind of imbalance dataset,the key issue for effectively classification learning is to improve the prediction accuracy of Minority Class.The Minority Class is a type of special sub-sample relative to majority class,and the cost of Minority Class misclassification is extraordinary high.Therefore,it is a urgent problem to be solved that how to effectively reduce the misclassification of Minority Class.This paper obtains the various penalty factor with the use of information entropy,on the base of SVM adopting various penalty factor proposed by Veropoulous,enables the model improving the identification accuracy of Minority Class,and confirms the validation on the imbalance dataset of telecommunication customer churning,the result suggests that this method largely improves the identification accuracy of fled customer compared to other methods and is of great application meaning.
作者 方磊 马溪骏
出处 《情报学报》 CSSCI 北大核心 2011年第6期643-648,共6页 Journal of the China Society for Scientific and Technical Information
基金 博士点基金 网格环境下复杂决策问题的协同求解与支持系统研究(200803590007) 国家自然科学基金重大研究项目(90718037) 国家自然科学基金重点研究项目(70631003)资助
关键词 支持向量机 不平衡数据 信息熵 分类预测 客户流失 SVM imbalance dataset information entropy classification forecasting customer churning
  • 相关文献

参考文献16

二级参考文献50

  • 1姚敏.一种前向网络的多准则学习方法[J].通信学报,1996,17(4):113-117. 被引量:11
  • 2美国SAS软件研究所.SAS产品白皮书.www.sas.com.,. 被引量:1
  • 3[18]Schapire R E,Singer Y.Improved boosting algorithms using confidence-rated predictions[J].Machine Learning,1999,37(3):297 -336. 被引量:1
  • 4[19]Fan W,Stolfo S J,Zhang J,et al.AdaCost:misclassification cost-sensitive boosting[C]//Bratko I,Dzeroski S.Proc of the 16th Intern Conf on Meachine Learning.Morgan Kanfmann,1999:97-105. 被引量:1
  • 5[20]Joshi M V,Kumar V,Agarwal R C.Evaluating boosting algorithms to classify rare classes:comparison and improvements[C]// Cercone N,Lin T Y,Wu X.Pro of the 2001 IEEE Intern Conf on Data Mining.Washington DC:IEEE Computer Society Press,2001:257 -264. 被引量:1
  • 6[21]Chawla N V,Japkowicz,Kolcz A.Editorial:special issue on learning from imbalaneed data sets[J].SIGKDD Explorations Special Issue on Learning from Imbalanced Datasets,2004,6(1):1 -6. 被引量:1
  • 7[22]Chawlal N V,Lazarevic A,Hall L O.SMOTEBoost:improving prediction of the minority class in boosting[C]// The 7th European Conf on Principles and Practice of Knowledge Discovery in Databases.Berlin:Springer,2003:107-119. 被引量:1
  • 8[23]He Guoxun,Han Hui,Wang Wenyuan.An over-sampling expert system for learning from imbalaneed data sets[J].Neural Networks and Brain,2005,1:537 -541. 被引量:1
  • 9[25]Tao Ban,Shigeo Abe.Implementing multi-class classifiers by one-class classification methods[C]// 2006 International Joint Conference on Neural Networks Sheraton Vancouver Wall Centre Hotel.Vancouver,BC:IEEE Press,2006:16 -21,327 -332. 被引量:1
  • 10[26]Sun Y.Cost-sensitive boosting for classification of imbalanced data[D].Canada:University of Waterloo,2007. 被引量:1

共引文献2414

同被引文献68

  • 1苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量:388
  • 2水静,米红娟.基于数据挖掘技术的电信业客户流失分析[J].北京电子科技学院学报,2007,15(2):91-94. 被引量:5
  • 3搜狗实验室.文本分类语料库[EB/OL].[2008-07-20].http://www.sogou.com/labs/dl/c.html. 被引量:5
  • 4SALTON G, WONG A, YANG C S. A vector space model for automatic indexing [ J]. Communications of the ACM, 1975, 18 (11): 613-620. 被引量:1
  • 5BLEI D M, NG A Y, JORDAN M I. Latent dirichlet allocation [J]. The Journal of Machine Learning Research, 2003, 3: 993-1022. 被引量:1
  • 6方东吴.基于LDA的微博短文本分类技术的研究与实现[D].沈阳:东北大学,2011. 被引量:1
  • 7JAPKOWICZ N, STEPHEN S. The class imbalance problem: a systematic study [ J ]. Intelligent Data Analysis, 2002, 6 (5) : 429-449. 被引量:1
  • 8HEINRICH G. Parameter estimation for text analysis [ R ]. Technical Report, 2005. 被引量:1
  • 9CAO Juan, XIA Tian, LI Jintan, et al. A density-based meth- od for adaptive LDA model selection [ J]. Neurocomputing, 2009, 72 (7): 1775-1781. 被引量:1
  • 10WILSON A T, CHEW P A. Term weighting schemes for latent dirichlet allocation [ C ] // Human Language Technologies : The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Com- putational Linguistics, 2010: 465-473. 被引量:1

引证文献5

二级引证文献23

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部