摘要
针对传统的合成少数类过采样技术(synthetic minority oversampling technique,SMOTE)在类别区域重合的数据集应用时,可能产生多个更接近多数类的人工样例,甚至突破类别边界,从而影响整体分类性能的情况,提出了一种最近三角区域的SMOTE方法,使合成的人工样例只出现在少数类样例的最近三角区域内部,并且删除掉距离多数类更近的合成样例,从而使生成的样例更接近少数类,且不突破原始的类别边界。实验分别在人工数据集和改进的UCI数据集上进行,并和原始的SMOTE方法分别在G-mean和F-value的评价指标上进行了对比。实验结果验证了改进的SMOTE方法在类别区域有重合的数据集上要优于原始SMOTE方法。
When the traditional synthetic minority oversampling technique(SMOTE)is applied to the imbalanced data that has different classes overlap region,it is possible to generate a number of artificial samples,which are more close to the majority class,even to break through the class boundaries,thus affecting the overall classification performance.A new improved SMOTE is proposed,which generates an artificial sample in the nearest neighbor triangular regions of the minority class sample,and deletes the artificial samples which are more close to the majority class.So the new method ensures that the artificial samples are more close to the minority class without breaking the original class boundaries.The method is implemented on the artificial data sets and the UCI data sets.It is compared with the original SMOTE method on the evaluation indexes of G-mean and F-value respectively.The experimental results also verify that the improved SMOTE method is better than the original SMOTE method to handle with the imbalanced data has different classes overlap region.
作者
刘丹
王晓兰
邢胜
LIU Dan;WANG Xiao-lan;XING Sheng(College of Computer Science and Engineering,Cangzhou Normal University;Department of Information Engineering,Cangzhou Technical College,Cangzhou 061001,China)
出处
《科学技术与工程》
北大核心
2018年第28期215-219,共5页
Science Technology and Engineering
基金
国家自然科学基金(71371063
61170040
61672205)资助
关键词
不平衡数据
过采样方法
分类
最近邻规则
imbalanced data
oversampling technique
classification
the nearest neighbor rule