期刊文献+

基于特征选择的过抽样算法的研究 被引量:1

Study of Over-Sampling Method Based on Feature Selection
下载PDF
导出
摘要 为了提高不平衡数据集分类中少数类的分类精度,提出了基于特征选择的过抽样算法。该算法考虑了不同的特征列对分类性能的不同作用,首先对训练集进行特征选择,选出一组特征列,然后根据选出的特征列合成少数类样本,合成的每个少数类样本的特征由两部分组成,一部分是特征选择的特征列对应的特征,另一部分是按照SMOTE原理合成的特征。将基于特征选择的过抽样算法和SMOTE算法进行实验比较,结果表明基于特征选择的过抽样算法的性能优于SMOTE算法,能有效降低数据的不平衡性,提高少数类的分类精度。 To significantly improve the classification performance of the minority class, we present an over-sampling method based on feature selection. Firstly, feature selection is performed on the training data set in order to select a set of key colmnns. Then minority class samples are produced using selected key columns, and each sample consists of two kinds of features. One type of features is characteristic value that is corresponding to the selected key columns, the others is generated according to the principle of SMOTE. Comparing to SMOTE algorithm, results show that the new method performs better than SMOTE, and it can effectively reduce the imbalance of data and improve the classification accuracy of the minority class.
出处 《电信科学》 北大核心 2012年第1期87-91,共5页 Telecommunications Science
基金 国家自然科学基金资助项目(No.60842009 No.60905034 No.60974126) 浙江省自然科学基金资助项目(No.Y1110342)
关键词 不平衡数据集 特征选择 过抽样 遗传算法 imbalanced data set, feature selection, over sampling, genetic algorithm
  • 相关文献

参考文献20

  • 1Lu Huijuan, Chen Wutao, Ma Xiaoping, et al. Model-free gene selection using genetic algorithms. International Journal of Digital and Its Applications, 2011,5(1):195-203. 被引量:1
  • 2Holland J H. Adaptation in Natural and Artificial Systems. MIT Press,1992. 被引量:1
  • 3Prithviraj Sen, Lise Getoor. Cost-sensitive learning with conditional markov networks. Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, Penn, US, 2006. 被引量:1
  • 4Jia Li, Yonghong Tian, Tiejun Huang, et al. Cost-sensitive rank learning from positive and unlabeled data for visual saliency estimation. IEEE Signal Processing Letters, 2010,17(6):591-594. 被引量:1
  • 5Golub T R, Slonim D K, Tamayo P, et al. Class discovery and class prediction by gene expression monitoring. Science, 1999 (286): 531-537. 被引量:1
  • 6Alon U, Barkai N, Nottemman D A, et ol. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. PNAS, 1999(96):6 745-6 750. 被引量:1
  • 7Javed Khan, Jun S Wei, Markus Ringner, et al. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine, 2001 (7): 673-679. 被引量:1
  • 8http://archive.ics.uci.edu/ml/datasets.html 被引量:5
  • 9William H W, Olivil Mangasarian. Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proceedings of the National Academy of Sciences, 1990 (87): 9 193-9 196. 被引量:1
  • 10Chih-Wei Hsu, Chih-Jen Lin. A comparison of methods for multi-class support vector machines. IEEE Transactions on Neural Networks, 2002, 13(2):415-425. 被引量:1

二级参考文献58

  • 1许建华,张学工,李衍达.支持向量机的新发展[J].控制与决策,2004,19(5):481-484. 被引量:132
  • 2Elkan C,The foundations of cost-sensitive leaming[C]//Proc of the 17th International Joint Conference on Artificial Intelligence (IJCA I'01) ,2001:973-978. 被引量:1
  • 3Domingos P.MetaCost:A general method for making classifiers cost-sensitive[C]//Proc of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD' 99), 1999: 155-164. 被引量:1
  • 4Fan W, Stolfo S J, Zhang J, et al.AdaCost: Misclassification cost-sensitive Boosting[C]//Proc of the 16th International Conference on Machine Learning(ICML'99),1999:97-105. 被引量:1
  • 5Brefeld U, Scheffer T.AUC maximizing support vector learning[C]// Proc of ICML Workshop on ROC Analysis in Machine Learning, 2005. 被引量:1
  • 6Chen X, Gertach B, Casasent D.Pruning support vectors for irabalanced data classification[C]//Proc of International Joint Conference on Neural Networks,2005:1883-1888. 被引量:1
  • 7Catlut J, Dupont P. Fp support vector machines[C]//Proc of International Joint Conference on Neural Networks,2005. 被引量:1
  • 8Chawla N, Japkowicz N, Kolcz A.Editorial: Special issue on learning from imbalanced data sets[J].SIGKDD Explorations Special Issue on Learning from Imbalanced Datasets, 2004, 6(1):1-6. 被引量:1
  • 9Chawla N,Bowyer K, Hall L, et al.SMOTE: Synthetic minority over-sampling technique[J].Joumal of Artificial Intelligence Research, 2002,16: 321-357. 被引量:1
  • 10I Zhou Z H, Liu X Y.Training cost-sensitive neural networks with methods addressing the class imbalance problem[J].IEEE Trans on Knowl Data Eng,2006,18(1) :63-77. 被引量:1

共引文献57

同被引文献3

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部