期刊文献+

密度不均衡数据分类算法 被引量:8

A Classification Algorithm for Imbalanced Dataset of Sample Density
下载PDF
导出
摘要 针对不均衡数据下分类超平面偏移、少数类识别率较低的问题,提出一种基于样本密度的不均衡数据分类算法。该算法首先计算样本密度和类样本密度,依据类样本密度之间的关系确定聚类类数,然后利用K-means聚类算法对多数类样本进行聚类,用聚类所得类中心作为样本集取代原多数类样本集,最后对新构造的训练集进行训练得到最终决策函数。其实验结果表明,该算法能够提高SVM在不均衡数据下的分类性能,尤其是少数类的分类性能。 In order to resolve the classifiers' over fitting phenomenon to enhance classification performance,a new algorithm based on sample density is proposed for imbalanced data classification. Firstly,it computes the density of samples and the density of every class. Then it works out the number of class with cluster algorithm according to the relation of sample density of every class. Then it clusters the samples of majority class using K-means algorithm with above class number. The cluster centers are treated as the new samples and then a new training dataset is constructed with the new samples and minority dataset. According to the new training dataset,we can get the decision function. The method may resolve the problem of imbalanced dataset and improve the classification performance of SVM. Results of experiments with artificial dataset and six groups of UCI dataset show that the algorithm is effective for imbalanced dataset,especially for the minority class samples.
作者 杜红乐 张燕
出处 《西华大学学报(自然科学版)》 CAS 2015年第5期16-23,74,共9页 Journal of Xihua University:Natural Science Edition
基金 陕西省自然科学基金项目(2014JM2-6122) 陕西省教育厅科技计划项目(12JK0748) 商洛学院科学与技术研究项目(13sky024)
关键词 支持向量机 不均衡数据集 样本密度 欠取样 K-近邻 support vector machine imbalanced dataset sample density under-sampling K-nearest neighbor
  • 相关文献

参考文献12

二级参考文献160

  • 1张琦,吴斌,王柏.非平衡数据训练方法概述[J].计算机科学,2005,32(10):181-186. 被引量:10
  • 2GONG Maoguo,DU Haifeng,JIAO Licheng.Optimal approximation of linear systems by artificial immune response[J].Science in China(Series F),2006,49(1):63-79. 被引量:21
  • 3刘胥影,吴建鑫,周志华.一种基于级联模型的类别不平衡数据分类方法[J].南京大学学报(自然科学版),2006,42(2):148-155. 被引量:23
  • 4VAPNIK V. The nature of statistical learning theory [ M ]. Springer-Verlag, NY, 2000 : 138-167. 被引量:1
  • 5IMAM T, TING K M, KANMRUZZAMAN J. z-SVM: An SVM for improved classification of imbalanced data [A]. Australian Joint Conference on AI[C]. Hobart, Australia: Springer, 2006:264-273. 被引量:1
  • 6WU G, CHANG E. Class-boundary alignment for imbalanced dataset learning [ A ]. Workshop on learning from imbalanced data sets Ⅱ, ICML [ C ]. Washington, DC: AAAI Press, 2003:49-56. 被引量:1
  • 7CHAWLA N, BOWYER K, Hall L, et al. SMOTE: Synthetic minority over-sampling technique [ J ]. Journal of Artificial Intelligence Research, 2002,16( 1 ) :321-357. 被引量:1
  • 8KUBAT M, MATWIN S. Addressing the curse of imbalanced training sets: one-sided selection [ A ]. Proc. of the 14th International Conference on Machine Learning [ C ]. San Francisco, CA: Morgan Kaufmann 1997: 217-225. 被引量:1
  • 9CRISTIANINI N, KANDOLA J, ELISSEEFF A, et alJ. On kernel target alignment[ A]. Proceedings of the Neural Information Processing Systems [ C ]. Shanghai, China: The MIT Press, 2001:367-373. 被引量:1
  • 10VEROPOULOS K, CAMPBELL C, CRISTIANINI N. Controlling the sensitivity of support vector machines [ A ]. Proceedings of the International Joint Conference on AI [ C]. San Francisco, CA: Morgan Kaufmann, 1999:55-60. 被引量:1

共引文献250

同被引文献67

引证文献8

二级引证文献20

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部