期刊文献+

不平衡数据集的DC-SMOTE过采样方法

DC-SMOTE oversampling method for an imbalanced dataset
下载PDF
导出
摘要 针对不平衡数据集在分类任务中表现不佳的问题,提出基于局部密度与集中度的过采样算法。针对数据集中所有的少数类样本点,分别利用高斯核函数与局部引力来计算局部密度与集中度;对于局部密度较小的部分有针对性地合成第一类新样本,解决类内不平衡问题。根据集中度的不同,区分出少数类样本的边界,有针对性地合成第二类新样本,达到强化边界的作用;同时,通过自适应生成新样本,有效解决大部分过采样算法没有明确过采样量或者盲目追求样本平衡度相等的问题。最后,在公开的12个不平衡数据集上进行了实验,实验结果表明,本算法在低不平衡数据集与高不平衡数据集上的应用均拥有良好的表现。 Inspired by the poor performance of imbalanced datasets in classification tasks,an oversampling algorithm based on local density and centrality is proposed.First,for all the minority sample points in the dataset,the Gaussian kernel function and local gravity are used to calculate the local density and centrality,respectively.Furthermore,the first type of new samples is synthesized for the portion with small local density to solve the imbalance problem within this kind.According to the difference of centrality,the boundaries of minority samples are distinguished,and the second kind of samples are specifically synthesized to strengthen the boundaries.Meanwhile,new samples are generated adaptively,which solves the problem that most oversampling algorithms fail to clearly define the oversampling quantity or blindly pursue the balance of the number of samples of two categories.Finally,experiments are conducted on 12 public imbalanced datasets and results reveal that the algorithm has good performance in low-and high-imbalanced datasets.
作者 冀常鹏 尚佳奇 代巍 JI Changpeng;SHANG Jiaqi;DAI Wei(School of Electronic and Information Engineering,Liaoning Technical University,Huludao 125105,China;Graduate School,Liaoning Technical University,Huludao 125105,China)
出处 《智能系统学报》 CSCD 北大核心 2024年第3期525-533,共9页 CAAI Transactions on Intelligent Systems
关键词 不平衡数据集 过采样 高斯核函数 局部引力 高不平衡数据 合成少数类过采样 不平衡度 分类 imbalanced dataset oversampling Gaussian kernel local gravity high-imbalanced data SMOTE imbalance ratio classification
  • 相关文献

参考文献4

二级参考文献10

共引文献48

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部