Effective and Efficient Feature Selection for Large-scale Data Using Bayes' Theorem 被引量：7

Effective and Efficient Feature Selection for Large-scale Data Using Bayes' Theorem

下载PDF

导出

摘要 This paper proposes one method of feature selection by using Bayes＇ theorem. The purpose of the proposed method is to reduce the computational complexity and increase the classification accuracy of the selected feature subsets. The dependence between two attributes （binary） is determined based on the probabilities of their joint values that contribute to positive and negative classification decisions. If opposing sets of attribute values do not lead to opposing classification decisions （zero probability）, then the two attributes are considered independent of each other, otherwise dependent, and one of them can be removed and thus the number of attributes is reduced. The process must be repeated on all combinations of attributes. The paper also evaluates the approach by comparing it with existing feature selection algorithms over 8 datasets from University of California, Irvine （UCI） machine learning databases. The proposed method shows better results in terms of number of selected features, classification accuracy, and running time than most existing algorithms. This paper proposes one method of feature selection by using Bayes＇ theorem. The purpose of the proposed method is to reduce the computational complexity and increase the classification accuracy of the selected feature subsets. The dependence between two attributes （binary） is determined based on the probabilities of their joint values that contribute to positive and negative classification decisions. If opposing sets of attribute values do not lead to opposing classification decisions （zero probability）, then the two attributes are considered independent of each other, otherwise dependent, and one of them can be removed and thus the number of attributes is reduced. The process must be repeated on all combinations of attributes. The paper also evaluates the approach by comparing it with existing feature selection algorithms over 8 datasets from University of California, Irvine （UCI） machine learning databases. The proposed method shows better results in terms of number of selected features, classification accuracy, and running time than most existing algorithms.

作者 Subramanian Appavu Alias Balamurugan Ramasamy Rajaram

机构地区 Department of Information Technology Department of Computer Science and Information Technology

出处《International Journal of Automation and computing》 EI 2009年第1期62-71,共10页 国际自动化与计算杂志（英文版）

关键词 Data mining CLASSIFICATION feature selection dimensionality reduction Bayes＇ theorem. Data mining, classification, feature selection, dimensionality reduction, Bayes＇ theorem.

分类号 TP311.13 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献16

1Edda Leopold,J?rg Kindermann.Text Categorization with Support Vector Machines. How to Represent Texts in Input Space?[J].Machine Learning (-).2002(1-3) 被引量：1
2Wenke Lee,Salvatore J. Stolfo,Kui W. Mok.Adaptive Intrusion Detection: A Data Mining Approach[J].Artificial Intelligence Review.2000(6) 被引量：1
3KianSing Ng,Huan Liu.Customer Retention via Data Mining[J].Artificial Intelligence Review.2000(6) 被引量：1
4Kamal Nigam,Andrew Kachites Mccallum,Sebastian Thrun,Tom Mitchell.Text Classification from Labeled and Unlabeled Documents using EM[J].Machine Learning (-).2000(2-3) 被引量：1
5Scott Cost,Steven Salzberg.A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features[J].Machine Learning.1993(1) 被引量：1
6J. R. Quinlan.Induction of decision trees[J].Machine Learning.1986(1) 被引量：1
7R. S. Michalski.Pattern Recognition as Rule-guided Induc- tive Learning[].IEEE Transactions on Pattern Analysis and Machine Intelligence.1980 被引量：1
8P. Somol,P. Pudil,J. Kittler.Fast Branch and Bound Algo- rithm in Feature Selection[].IEEE Transactions on Pattern Analysis and Machine Intelligence.2000 被引量：1
9N. Xiong.A Hybrid Approach to Input Selection for Com- plex Processes[].IEEE Transactions on Systems Man and Cybernetics – Part A.2002 被引量：1
10L. I. Kuncheva,J. C. Bezdek.Nearest Prototype Classi?- cation: Clustering, Genetic Algorithms or Random Search[].IEEE Transactions on Systems Man and Cybernetics – Part C.1998 被引量：1

同被引文献33

1梁吉业,李超伟,魏巍.基于Rough Sets的特征选择研究进展[J].山西大学学报（自然科学版）,2012,35(2):211-218. 被引量：2
2J.Alamelu Mangai,V.Santhosh Kumar,S.Appavu alias Balamurugan.A Novel Feature Selection Framework for Automatic Web Page Classification[J].International Journal of Automation and computing,2012,9(4):442-448. 被引量：3
3张丽新,王家廞,赵雁南,杨泽红.基于Relief的组合式特征选择[J].复旦学报（自然科学版）,2004,43(5):893-898. 被引量：44
4黄林军,张勇,郭冰榕.机器学习技术在数据挖掘中的商业应用[J].中山大学学报论丛,2005,25(6):145-148. 被引量：15
5Richard Jensen.Rough Sets,Their Extensions and Applications[J].International Journal of Automation and computing,2007,4(3):217-228. 被引量：5
6Dhage,Sudhir N.,B.B.Meshram.Intrusion detection system in cloud computing environment[J].International Journal of Cloud Computing, 2012,1 ( 2-3 ) : 261-282. 被引量：1
7L. Prodromidis,S.J.Stolfo. Mining databases with different schemas:Integrating incompatible classifiers[C].Proc.4th Intl. Conf.Knowledge Discovery and Data Mining, 1998. 被引量：1
8Han,Sang Wook,Jae Yearn Kim.A new decision tree algorithm based on rough set theory[J].International Jour- nal of Innovative Computing,Information and Control,2008 (10) :2749-2757. 被引量：1
9J.R.Quinlan.Improved use of continuous attributes in c.45 [J].Journal of Artificial Intelligence Research,1996 (4): 77-90. 被引量：1
10Preacher,Kristopher J.,Patrick J.Curran,and Daniel J. Bauer.Computational tools for probing interactions in multi- ple linear regression,multilevel modeling,and latent curve analysis[J].Journal of Educational and Behavioral Statis- tics, 2006,31 (4) : 437-448. 被引量：1

引证文献7

1费江华,何永辉,孙晨,黄胜标.一种基于特征选择的组合分类器在带钢表面缺陷分类中的应用[J].冶金自动化,2010,34(2):19-23. 被引量：2
2J.Alamelu Mangai,V.Santhosh Kumar,S.Appavu alias Balamurugan.A Novel Feature Selection Framework for Automatic Web Page Classification[J].International Journal of Automation and computing,2012,9(4):442-448. 被引量：3
3Hua-Ping Zhang,Rui-Qi Zhang,Yan-Ping Zhao,Bao-Jun Ma.Big Data Modeling and Analysis of Microblog Ecosystem[J].International Journal of Automation and computing,2014,11(2):119-127. 被引量：6
4Qiang Lv,Xiao-Yan Xia,Pei-De Qian.A Fast Calculation of Metric Scores for Learning Bayesian Network[J].International Journal of Automation and computing,2012,9(1):37-44.
5蒋忠新,蒋颖,蒋光和.基于机器学习的入侵检测方法性能评估[J].企业技术开发,2013,32(12):1-4. 被引量：1
6陈圣兵,王晓峰.基于信息熵的不完备数据特征选择算法[J].模式识别与人工智能,2014,27(12):1131-1137. 被引量：5
7雷海锐,高秀峰,刘辉.基于机器学习的混合式特征选择算法[J].电子测量技术,2018,41(16):42-46. 被引量：7

二级引证文献24

1常梦容,王海瑞,肖杨.mRMR特征筛选和随机森林的故障诊断方法研究[J].电子测量与仪器学报,2022,36(3):175-183. 被引量：4
2Hua-Ping Zhang,Rui-Qi Zhang,Yan-Ping Zhao,Bao-Jun Ma.Big Data Modeling and Analysis of Microblog Ecosystem[J].International Journal of Automation and computing,2014,11(2):119-127. 被引量：6
3张会清,牛铮.基于线性判别分析和梯度提升决策树的WLAN室内定位算法[J].仪器仪表学报,2018,39(12):136-143. 被引量：14
4韩莹,李姗姗,陈福明.基于机器学习的地震异常数据挖掘模型[J].计算机仿真,2014,31(11):319-322. 被引量：11
5吴彬彬,汤勃,孔建益,王兴东.MAS小波的钢板表面缺陷边缘检测的研究[J].机械设计与制造,2015(5):43-46. 被引量：5
6姜丹,张晓雯,周丽.改进的聚类分析算法在科研立项管理中的应用研究[J].软件工程,2016,19(6):13-16.
7洪丹丹,罗军峰,冯兴利,徐墨,锁志海.基于RSA与MD5签名的实名制微门户设计[J].微电子学与计算机,2016,33(9):36-41. 被引量：5
8傅城州,汤庸,贺超波,王津凌,袁成哲.基于标签相似度计算的学术圈构建方法[J].计算机科学,2016,43(9):52-56. 被引量：5
9王林,于洋,孙海,王畅,庞在刚,陈瑾.多级分类表面检测系统在冷轧薄板产线的应用[J].轧钢,2016,33(5):63-66. 被引量：11
10潘庆先,董红斌,韩启龙,王莹洁,丁蕊.一种基于BP神经网络的属性重要性计算方法[J].中国科学技术大学学报,2017,47(1):18-25. 被引量：28

1王欢良,韩纪庆,李海峰.基于特征似然度加权和维数缩减的Robust语音端点检测[J].声学学报,2007,32(1):62-68. 被引量：7
2夏先智,杜新宇,郑扬飞.基于蚁群遗传算法的属性约简[J].计算机与现代化,2013(1):25-28. 被引量：1
3朱明,林世隆,王俊普.一种聚类型基于示例学习新方法[J].计算机研究与发展,2000,37(11):1293-1297. 被引量：1
4史骏,姜志国,赵丹培,陆明.利用自适应近邻选择和低秩表示的半监督鉴别分析[J].计算机辅助设计与图形学学报,2015,27(2):238-248. 被引量：2
5谢娟英,张兵权,汪万紫.基于双支持向量机的偏二叉树多类分类算法[J].南京大学学报（自然科学版）,2011,47(4):354-363. 被引量：28
6柳家福,李欢,贺金平,刘天石,王启聪,吴泽彬.基于GPU的高光谱遥感主成分分析并行优化[J].航天返回与遥感,2014,35(6):99-106. 被引量：2
7史考特·贝瑞纳托(采访),时青靖(译),刘筱薇(校).你能准确画出苹果公司的标志吗[J].发现,2015,0(7):48-49.
8戴喜华,姚维.改进的遗传算法在分类规则挖掘中的应用[J].微计算机信息,2010,26(33):147-149.
9王和勇,李磊,姚正安.基于纹理的图像检索[J].计算机应用研究,2002,19(10):82-83. 被引量：6
10曹丽,陈才扣.最大散度差无监督鉴别特征抽取与人脸识别[J].计算机应用,2008,28(S2):182-184. 被引量：2

International Journal of Automation and computing

2009年第1期

浏览历史

内容加载中请稍等...

Effective and Efficient Feature Selection for Large-scale Data Using Bayes' Theorem 被引量：7

参考文献16

同被引文献33

引证文献7

二级引证文献24

相关作者

相关机构

相关主题

浏览历史