期刊文献+

一种基于密度的SMOTE方法研究 被引量:9

Research on the SMOTE method based on density
下载PDF
导出
摘要 重采样技术在解决非平衡类分类问题上得到了广泛的应用。其中,Chawla提出的SMOTE(Synthetic Minority Oversampling Technique)算法在一定程度上缓解了数据的不平衡程度,但这种方法对少数类数据不加区分地进行过抽样,容易造成过拟合。针对此问题,本文提出了一种新的过采样方法:DS-SMOTE方法。DS-SMOTE算法基于样本的密度来识别稀疏样本,并将其作为采样过程中的种子样本;然后在采样过程中采用SMOTE算法的思想,在种子样本与其k近邻之间产生合成样本。实验结果显示,DS-SMOTE算法与其他同类方法相比,准确率以及G值有较大的提高,说明DS-SMOTE算法在处理非平衡数据分类问题上具有一定优势。 In recent years, over-sampling has been widely used in the field of classification of imbalanced classes. The SMOTE(Synthetic Minority Oversampling Technique) algorithm, presented by Chawla, alleviates the degree of data imbalance to a certain extent, but can lead to over-fitting. To solve this problem, this paper presents a new sampling method, DS-SMOTE, which identifies sparse samples based on their density and uses them as seed samples in the process of sampling. The SMOTE algorithm is then adopted, and a synthetic sample is generated between the seed sample and its k neighbor. The proposed algorithm showed great improvement in precision and G-mean compared with similar al- gorithms, and it has advantage of treating imbalanced data classification.
出处 《智能系统学报》 CSCD 北大核心 2017年第6期865-872,共8页 CAAI Transactions on Intelligent Systems
基金 国家自然科学基金项目(61772323 61402272) 山西省自然科学基金项目(201701D121051)
关键词 非平衡 分类 采样 准确率 密度 imbalance classification sampling precision density
  • 相关文献

参考文献3

二级参考文献56

  • 1Weiss G M. Mining with Rarity:A Unifying Framework[J]. SIGKDD Explorations, 2004,6(1) :7-19. 被引量:1
  • 2Weiss G M. Learning with Rare Cases and Small Disjunets [C]//Proc of the 12th Int'l Conf on Machine Learning, 1995:558-565. 被引量:1
  • 3Japkowicz N, Stephen S. The Class Imbalance Problem: A Systematic Study[J]. Intelligent Data Analysis Journal, 2002,6(5) :429 450. 被引量:1
  • 4Chawla N V, Bowyer K W, Hall I. O, et al. SMOTE: Synthetic Minority Over-Sampling Technique[J]. Journal of Artificial Intelligence Research, 2002,16(6) : 321-357. 被引量:1
  • 5Kubat M, Matwin S. Addressing the Curse of Imbalanced Data Sets:One Sided Sampling[C]//Proc of the 14th Int'l Conf on Machine Learning, 1997:179-186. 被引量:1
  • 6Chawla N, Lazarevic A, Hall L, et al. SMOTEBoost: Improving Prcdiction of the Minority Class in Boosting[C]// Proc of the 7th European Conf on Principles and Practice of Knowledge Discovery in Databases, 2003 : 107-119. 被引量:1
  • 7Fan W, Stofol S, Zhang J X. AdaCost: Misclassification Cost Sensitive Boosting[C]//Proc of the 16th Int'l Conf on Machine Learning, 1999: 97-105. 被引量:1
  • 8Joshi M V, Agarwal R C, Kumar V. Predicting Rare Classes: Can Boosting Make any Weak Learner Strong[C]//Proc of the 8th ACM SIGKDD Int'l Conf on Knowledge Discovery and Data Mining, 2002:297-306. 被引量:1
  • 9Zheng Z H, Srihari R. Optimally Combining Positive and Negative Features for Text Categorization[C]//Proc of the Int'l Conf on Machine Learning, 2003 : 241-245. 被引量:1
  • 10Raskutti A, Kowalczyk A. Extreme Rebalancing for SVMs: a SVM Study[J]. SIGKDD Explorations, 2004,6 (1): 60-69. 被引量:1

共引文献62

同被引文献73

引证文献9

二级引证文献43

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部