期刊文献+

一种改进的少数类样本过抽样算法 被引量:2

Improved Over-sampling Algorithm of Minority Class Sample
下载PDF
导出
摘要 针对偏斜数据集的分类问题,提出一种改进的少数类样本过抽样算法(B-ISMOTE)。在边界少数类实例及其最近邻实例构成的n维球体空间内进行随机插值,以此产生虚拟少数类实例,减小数据的不均衡程度。在实际数据集上进行实验,结果证明,与SMOTE算法和B-SMOTE算法相比,B-ISMOTE算法具有较优的分类性能。 Aiming at the classification of the skewed dataset, this paper proposes an improved over-sampling algorithm of minority class sample, named B-ISMOTE. It improves the data unbalanced distribution of degree through randomized interpolation to produce virtual minority class instances in the sphere space, which constitute of the borderline minority class instances and its nearest neighbor. Experimental results on the real datasets show that compared with SMOTE algorithm and B-SMOTE algorithm, B-ISMOTE algorithm has better classification performance.
出处 《计算机工程》 CAS CSCD 2012年第4期67-69,共3页 Computer Engineering
基金 国家自然科学基金资助项目(60873196)
关键词 偏斜数据集 分类 过抽样 虚拟实例 n维球体空间 skewed dataset classification over-sampling virtual instance n dimension sphere space
  • 相关文献

参考文献10

二级参考文献32

  • 1韩慧,王路,温明,王文渊.不均衡数据集学习中基于初分类的过抽样算法[J].计算机应用,2006,26(8):1894-1897. 被引量:11
  • 2庄丽娟.遗传算法的交叉算子研究[J].内蒙古民族大学学报(自然科学版),2006,21(6):637-639. 被引量:3
  • 3吴洪兴,彭宇,彭喜元.适用于不平衡样本数据处理的支持向量机方法[J].电子学报,2006,34(B12):2395-2398. 被引量:16
  • 4Chawla N, Bowyer K, Hall L, et al. Smote: Synthetic Minority over Sampling Technique[J]. Artificial Intelligence Research, 2002, 16: 321-356. 被引量:1
  • 5Wright A H. Genetic Algorithms for Real Parameter Optimization. Foundations of Genetic Algorithms[M]. [S.l.]: Morgan Kaufmann, 1991. 被引量:1
  • 6Weiss G M.Miningwith rarity:a unifying framework.Chicago,Ⅱ,USA,SIGKDD Explorations,2004; 6(1):7-19. 被引量:1
  • 7Joshi M,Kumar V,Agarwa L R.Evaluating boosting algorithms to classify rare classes:comparison and improvements.First IEEE International Conference on Data Mining.San Jose,CA,2001. 被引量:1
  • 8Wug C.Class-boundary alignment for imbalanced data setLearning.Workshop on Learning from Imbalanced Datasets(ICML 03).Washington DC,2003:49-56. 被引量:1
  • 9Chawla N V,Bowyer K W,Hall L O,et al.SMOTE:synthetic minority over-sampling technique.Washington,USA,Journal of Artificial Intelligence Research,2002;16:321-357. 被引量:1
  • 10Blake C,Merz C.UCI repository of machine learning databases.http://www.ics.uci.edu/~mlearn/MLRepository.html.1998. 被引量:1

共引文献27

同被引文献18

  • 1王鹏伟,李滔,吴秀清.一种基于SVM后验概率的MRF分割方法[J].遥感学报,2008,12(2):208-214. 被引量:7
  • 2王月盈.淘宝恶意评价解决对策探讨[J].经济视野,2013(18). 被引量:1
  • 3Kubat M, Holte RC, Matwin S. Machine leaming for the detection of oil spills in satellite radar images. Machine Learning, 1998, 30 (2-3): 195-215. 被引量:1
  • 4Castillo MD, Serrano JI. A multistrategy approach for digital text categorization from imbalanced documents. SIGKDD Explorations, 2004, 6 (1): 70-79. 被引量:1
  • 5Zheng ZH, Wu X, Srihari RK. Feature selection for text categorization on imbalanced data. SIGKDD Explorations, 2004, 6(1): 80-89. 被引量:1
  • 6Cohen G, Hilario M, Sax H. Data imbalance in surveillance of nosocomial infections. Proc. of the 4th International Symposium on Medical Data Analysis. Berlin. 2003. 109-117. 被引量:1
  • 7Yoon K, Kwek S. An unsupervised learning approach to resolving the data imbalanced issue in supervised learning problems in function genomics. Proc. of the 5th International Conference on Hybrid Intelligent Systems. 2005. 303-308. 被引量:1
  • 8Brefeld U, Scheffer T. AUC maximizing support vector learning. Proc. of ICML Workshop on ROC Analysis in Machine Learning. Bonn. 2005. 被引量:1
  • 9Monard,M. C. ,Batista G. E. A. P. A. Learning with Skewed Class Distributions. Advances in Logic[C]. Artificial Intelligence and Robotics, Sao Paulo, SP, 2002 : 173 - 180. 被引量:1
  • 10韩家炜,KamberMicheline,裴健等.数据挖掘:概念与技术(第3版)[M].北京:机械工业出版社,2012. 被引量:1

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部