期刊文献+

面向不平衡数据集的一种基于邻域的过采样算法

A Neighborhood-Based Over-Sampling Algorithm for Imbalanced Datasets
下载PDF
导出
摘要 过采样是一种通过合成新的同类样本解决数据集中类分布不平衡问题的常用方法。针对数据集中样本分布不平衡的问题,提出一种基于邻域概念的PSON算法。该算法定义每个少数类样本的影响力,依据不同影响力对少数类样本进行过采样以获得平衡数据集。在50个数据集上对8种过采样算法得到的数据集进行分类测试,通过威尔科克森符号秩检验比较7种分类性能指标,结果表明采用PSON算法后分类准确率提升显著。 Oversampling is a commonly used method to solve the problem of imbalanced class distribution in a dataset by synthesizing new samples of the same class.A PSON algorithm based on neighborhood concept is proposed to address the issue of imbalanced sample distribution in the dataset.This algorithm defines the influence of each minority class sample and oversamples the minority class samples based on different influences to obtain a balanced dataset.Classification tests were conducted on datasets obtained from 8 oversampling algorithms on 50 datasets.The Wilcoxon symbol rank test was used to compare 7 classification performance indicators,and the results showed that the use of PSON algorithm significantly improved classification accuracy.
作者 孟国庆 高源 梅颖 卢诚波 MENG Guoqing;GAO Yuan;MEI Ying;LU Chengbo(School of Computer Science and Technology,Zhejiang Sci-Tech University,Hangzhou 310018,China;State Grid Lishui Pow-er Supply Company,Lishui 323050,China;School of Mathematics and Computer,Lishui University,Lishui 323000,China;Zhejiang Detu Network Co.,Ltd,Lishui 310011,China)
出处 《软件导刊》 2024年第9期116-121,共6页 Software Guide
基金 国家自然科学基金项目(12171217) 浙江省自然科学基金项目(LY18F030003)。
关键词 不平衡数据集 过采样 分类 逆近邻 imbalanced dataset over-sampling classification reverse neighbors
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部