摘要
不平衡数据的研究是近年来数据挖掘的一大研究热点,针对不平衡数据的众多研究方法中,重采样是一个重要的研究方向。重采样的方法多种多样,本文从中选取了10种不同的重采样方法,通过对其进行对比学习,从实验中得到一些有益的结论:在不同的数据集上,过采样方法比欠采样方法更能取得较好的效果;过采样的实验结果也优于将过采样与欠采样结合的方式。
Study on imbalanced data sets is a hot research topic recently in data mining domain. Re-sampling is a key research direction among many approaches for imbalanced data sets. There are many different methods for dealing with imbalanced data sets. Ten different Re-sampling methods are chosen for comparative study. Some beneficial results are obtained from experiments that over-sampling methods perform better than under-sampling methods on different data sets. Over-sampling methods are also better than an ensemble method of over-sampling and under-sampling.
出处
《微计算机信息》
2011年第12期155-157,共3页
Control & Automation
关键词
不平衡数据
重采样
过采样
欠采样
对比学习
imbalanced data sets
re-sampling
over-sampling
under-sampling
comparative study