期刊文献+

Effective and Efficient Feature Selection for Large-scale Data Using Bayes' Theorem 被引量:7

Effective and Efficient Feature Selection for Large-scale Data Using Bayes' Theorem
下载PDF
导出
摘要 This paper proposes one method of feature selection by using Bayes' theorem. The purpose of the proposed method is to reduce the computational complexity and increase the classification accuracy of the selected feature subsets. The dependence between two attributes (binary) is determined based on the probabilities of their joint values that contribute to positive and negative classification decisions. If opposing sets of attribute values do not lead to opposing classification decisions (zero probability), then the two attributes are considered independent of each other, otherwise dependent, and one of them can be removed and thus the number of attributes is reduced. The process must be repeated on all combinations of attributes. The paper also evaluates the approach by comparing it with existing feature selection algorithms over 8 datasets from University of California, Irvine (UCI) machine learning databases. The proposed method shows better results in terms of number of selected features, classification accuracy, and running time than most existing algorithms. This paper proposes one method of feature selection by using Bayes' theorem. The purpose of the proposed method is to reduce the computational complexity and increase the classification accuracy of the selected feature subsets. The dependence between two attributes (binary) is determined based on the probabilities of their joint values that contribute to positive and negative classification decisions. If opposing sets of attribute values do not lead to opposing classification decisions (zero probability), then the two attributes are considered independent of each other, otherwise dependent, and one of them can be removed and thus the number of attributes is reduced. The process must be repeated on all combinations of attributes. The paper also evaluates the approach by comparing it with existing feature selection algorithms over 8 datasets from University of California, Irvine (UCI) machine learning databases. The proposed method shows better results in terms of number of selected features, classification accuracy, and running time than most existing algorithms.
出处 《International Journal of Automation and computing》 EI 2009年第1期62-71,共10页 国际自动化与计算杂志(英文版)
关键词 Data mining CLASSIFICATION feature selection dimensionality reduction Bayes' theorem. Data mining, classification, feature selection, dimensionality reduction, Bayes' theorem.
  • 相关文献

参考文献16

  • 1Edda Leopold,J?rg Kindermann.Text Categorization with Support Vector Machines. How to Represent Texts in Input Space?[J].Machine Learning (-).2002(1-3) 被引量:1
  • 2Wenke Lee,Salvatore J. Stolfo,Kui W. Mok.Adaptive Intrusion Detection: A Data Mining Approach[J].Artificial Intelligence Review.2000(6) 被引量:1
  • 3KianSing Ng,Huan Liu.Customer Retention via Data Mining[J].Artificial Intelligence Review.2000(6) 被引量:1
  • 4Kamal Nigam,Andrew Kachites Mccallum,Sebastian Thrun,Tom Mitchell.Text Classification from Labeled and Unlabeled Documents using EM[J].Machine Learning (-).2000(2-3) 被引量:1
  • 5Scott Cost,Steven Salzberg.A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features[J].Machine Learning.1993(1) 被引量:1
  • 6J. R. Quinlan.Induction of decision trees[J].Machine Learning.1986(1) 被引量:1
  • 7R. S. Michalski.Pattern Recognition as Rule-guided Induc- tive Learning[].IEEE Transactions on Pattern Analysis and Machine Intelligence.1980 被引量:1
  • 8P. Somol,P. Pudil,J. Kittler.Fast Branch and Bound Algo- rithm in Feature Selection[].IEEE Transactions on Pattern Analysis and Machine Intelligence.2000 被引量:1
  • 9N. Xiong.A Hybrid Approach to Input Selection for Com- plex Processes[].IEEE Transactions on Systems Man and Cybernetics – Part A.2002 被引量:1
  • 10L. I. Kuncheva,J. C. Bezdek.Nearest Prototype Classi?- cation: Clustering, Genetic Algorithms or Random Search[].IEEE Transactions on Systems Man and Cybernetics – Part C.1998 被引量:1

同被引文献33

引证文献7

二级引证文献24

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部