期刊文献+

不平衡数据采样方法的对比学习 被引量:4

Comparative study of re-sampling methods for imbalanced data sets
下载PDF
导出
摘要 不平衡数据的研究是近年来数据挖掘的一大研究热点,针对不平衡数据的众多研究方法中,重采样是一个重要的研究方向。重采样的方法多种多样,本文从中选取了10种不同的重采样方法,通过对其进行对比学习,从实验中得到一些有益的结论:在不同的数据集上,过采样方法比欠采样方法更能取得较好的效果;过采样的实验结果也优于将过采样与欠采样结合的方式。 Study on imbalanced data sets is a hot research topic recently in data mining domain. Re-sampling is a key research direction among many approaches for imbalanced data sets. There are many different methods for dealing with imbalanced data sets. Ten different Re-sampling methods are chosen for comparative study. Some beneficial results are obtained from experiments that over-sampling methods perform better than under-sampling methods on different data sets. Over-sampling methods are also better than an ensemble method of over-sampling and under-sampling.
出处 《微计算机信息》 2011年第12期155-157,共3页 Control & Automation
关键词 不平衡数据 重采样 过采样 欠采样 对比学习 imbalanced data sets re-sampling over-sampling under-sampling comparative study
  • 相关文献

参考文献14

  • 1Chawla, N. V., Bowyer,, K. W., Hall, L. O.,and Kegelmeyer, W. P. SMOTE: Synthetic Minority Over-sampling Technique. JAIR 16(2002), 321 - 357. 被引量:1
  • 2HAN Hui, WANGWen yuan, MAO Bing huan. Borderline- SMOTE: a new over-sampling method in imbalanced data sets learning [C] //Proc of International Conference on Intelligent Com- nutin~ (ICIC'05~ .Hefei: Is. n.1. 2005: 8782887. 被引量:1
  • 3Hien M, Nguyen_, Eric W. Coopery, Katsuari Kamei.Borderline Over-sampling for Imbalanced Data Classification IEEE SMC Hi- roshima Chapter (2009). 被引量:1
  • 4Tomek, I. Two Modifications of CNN. IEEE Transactions on Systems Man and Communications SMC-6(1976), 769 - 772. 被引量:1
  • 5HART P E. The condensed nearest neighbor rule [ J ]. IEEE Transactions on Information Theory IT-14(1968) : 515-516. 被引量:1
  • 6LAUR IKKALA J. Improving identification of difficult small classes by balancing class distribution[C] //Proc of the 8th Confer- ence on A I in Medicine. Europe: Artificial Intelligence Medicine, 2001: 63266. 被引量:1
  • 7Kubat, M., and Matwin, S. Addressing the Course of Imbalanced Training Sets: One-sided Selection. In ICML (1997), pp. 179 - 186. 被引量:1
  • 8Muhammad Atif Tahir,Josef Kittler, Krystian Mikolajczyk,Yan Fei. A multiple expert approach to the class imbalance p~blem us- ing inverse random under sampling:MCS 2009,LNCS 5519,pp. 82- 91,2009. 被引量:1
  • 9GUSTAVO E A, BATISTA P A, RONALDO C, et al. A study of the behavior of several methods for balancing machine learning training data[ J ]. S IGKDD Exp 1o ratio ns, 2004, 6 (1) : 20229. 被引量:1
  • 10陈思,郭躬德,陈黎飞.基于聚类融合的不平衡数据分类方法[J].模式识别与人工智能,2010,23(6):772-780. 被引量:28

二级参考文献31

  • 1宁彬.基于数据挖掘的入侵检测系统研究[J].微计算机信息,2008,24(6):97-98. 被引量:10
  • 2王杰,王金磊.分布式入侵检测技术在网络控制系统中的应用[J].微计算机信息,2005,21(07X):90-92. 被引量:18
  • 3Wenke Lee. Applying data mining to intrusion detection: The quest for automation, efficiency, and credibility [J].ACM SIGKDD Explorations Newsletter, 2002, 4(2):35-42 . 被引量:1
  • 4Kotsiantis S,Kanellopoulos D,Pintelas P.Handling Imbalanced Datasets:A Review.GESTS International Trans on Computer Science and Engineering,2006,30(1):25-36. 被引量:1
  • 5Burez J,van den Poel D.Handling Class Imbalance in Customer Churn Prediction.Expert Systems with Applications,2009,36(3):4626-4636. 被引量:1
  • 6Chawla N V,Bowyer K W,Hall L O,et al.SMOTE:Synthetic Minority Over-Sampling Technique.Journal of Artificial Intelligence Research,2002,16(1):321-357. 被引量:1
  • 7Han Hui,Wang Wenyuan,Mao Binghuan.Borderline-SMOTE:A New Over-Sampling Method in Imbalanced Data Sets Learning // Proc of the International Conference on Intelligent Computing.Hefei,China,2005:878-887. 被引量:1
  • 8Guo Hongyu,Viktor H L.Learning from Imbalanced Data Sets with Boosting and Data Generation:the DataBoost-IM Approach.ACM SIGKDD Explorations Newsletter,2004,6(1):30-39. 被引量:1
  • 9Chawla N V,Lazarevic A,Hall L O,et al.SMOTEBoost:Improving Prediction of the Minority Class in Boosting // Proc of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases.Dubrovnik,Croatia,2003:107-119. 被引量:1
  • 10Garcìa S,Herrera F.Evolutionary Undersampling for Classification with Imbalanced Datasets:Proposals and Taxonomy.Evolutionary Computation,2009,17(3):275-306. 被引量:1

共引文献30

同被引文献45

引证文献4

二级引证文献53

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部