期刊文献+

改进K-means的双向采样非均衡数据分类方法 被引量:4

Improved the bi-directional sampling unbalanced data classification method of K-means
下载PDF
导出
摘要 针对分类器在不均衡数据集上对小类分类准确率较差的问题,提出了改进K-means的双向采样算法KMBS(k-means bi-directional sampling),并将集成学习应用在分类算法上.首先,使用改进的K-means聚类算法将原始数据集划分为不同的聚类簇.其次,在聚类簇中使用改进的SMOTE算法对小类样本过采样,对聚类簇内的大类样本欠采样,使数据集平衡.多次执行该算法可以产生多个差异较大的数据集,因此训练出多个差异较大的分类器,提升集成学习的效果.通过分析实验结果,该算法较现有几种算法不仅能提高整体分类性能,并且有效提高小类样本的分类性能. Aiming at the poor classification accuracy of minority classes by classifier on unbalanced data sets,an improved k-means bi-directional sampling algorithm KMBS(k-means bi-directional sampling)is proposed,and integrated learning is applied to the classification algorithm.First,the improved k-means clustering algorithm is used to divide the original data set into different clustering clusters.Secondly,oversampling of the minority and under-sampling of the majority in the cluster using the modified SMOTE algorithm in the cluster,so as to make the dataset balance.Multiple executions of this algorithm can produce multiple data sets with large differences,so multiple classifiers with large differences can be trained to improve the effect of ensemble learning.By analyzing the experimental results,this algorithm can not only improve the overall classification performance,but also improve the classification performance of a few kinds of samples.
作者 柳毅 曾昊 LIU Yi;ZENG Hao(College of Computer Science,Guangdong University of Technology,Guangzhou 510006,China)
出处 《微电子学与计算机》 北大核心 2020年第3期60-65,共6页 Microelectronics & Computer
基金 国家自然科学基金(61572144) 广州市教育系统创新学术团队(1201610027)。
关键词 不均衡学习 双向采样 分类算法 集成学习 imbalanced learning bi-directional sampling classification ensemble learning
  • 相关文献

参考文献3

二级参考文献12

  • 1Weiss GM. Mining with rarity: A unifying framework [ J ]. SIGKDD Explorations, 2004,6(1) : 7 - 19. 被引量:1
  • 2Chawla N, Bowyer K, Hall L, Kegelmeyer W. SMOTE: Synthetic minority over-sampling technique[ J]. Journal of Artificial Intelligence Research,2002,16(1) :321 - 357. 被引量:1
  • 3Kubat M,Matwin S. Addressing the curse of imbalanced training sets:one-sided selection[A] .Proc of the 14th International Conference on Machine Leaming[C]. San Francisco,CA: Morgan Kaufmann, 1997.217 - 225. 被引量:1
  • 4Japkowicz N, Stephen S. The class imbalance problem: a systematic study [J]. Intelligent Data Analysis Journal, 2002, 6 (5) :429 - 450. 被引量:1
  • 5Gustavo E, Batista P, Ronaldo C.A study of the behavior of several methods for balancing machine learning training data [J]. SIGKDD Explorations, 2004,6 ( 1 ) : 20 - 29. 被引量:1
  • 6Veropoulos K, Campbell C, Cristianini N. Controlling the sensitivity of support vector machines[ A]. Proceedings of the International Joint Conference on AI[ C ]. San Francisco, CA: Morgan Kaufmann, 1999.55 - 60. 被引量:1
  • 7T Imam,K M Ting,J Kamruzzaman. z-SVM:An SVM for improved classification of imbalanced data [ A ]. Australian Joint Conference on AI[ C]. Hobart, Australia: Springer, 2006.264 -273. 被引量:1
  • 8L M Manevitz,M Yousef. One-class SVMs for document classification[ J]. Journal of Machine Leaming Research, 2001,2 (1):139- 154. 被引量:1
  • 9Chawla N, Bowyer K, Hall L, Kegelmeyer W. SMOTEBoost: Improving prediction of the minority class in boosting[A]. 7th European Conference on Principles and Practice of Knowledge Discovery in Databases [ C ]. Cavtat-Dubrovnik, Croatia: Springer,2003. 107- 119. 被引量:1
  • 10Wu G, Chang E. Class-boundary alignment for imbalanced dataset learning[ A]. Workshop on Leaming from Imbalanced Data Sets Ⅱ,ICML[C]. Washington, DC: AAAI Press,2003: 49 - 56. 被引量:1

共引文献30

同被引文献32

引证文献4

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部