期刊文献+

基于局部能量的集成特征选择 被引量:2

Ensemble feature selection based on local energy
下载PDF
导出
摘要 特征选择是机器学习和数据挖掘领域的关键问题之一,而特征选择的稳定性也是目前的一个研究热点.基于能量学习模型,分析了基于局部能量的特征选择方法并根据集成特征选择的原理,对基于局部能量的特征排序结果进行集成,以提高算法的稳定性.在现实数据集上的实验结果表明集成特征选择可以有效提高算法的稳定性. Feature selection is one of the key problems in machine learning and data mining to reduce the dimensionality of data, and the stability of feature selection is one of the current hot points. Stability is the insensitivity of the result of a feature selection algorithm to variations of the training set. This issue is particularly critical for applications where feature selection is used as a knowledge discovery tool for identifying characteristic markers to explain the observed phenomena. In the paper, on the one hand, a feature selection algorithm-Lmba is introduced in detail, and the evaluation criterion is deeply analyzed in terms of energy-based model. Lmba can be considered as one of feature ranking algorithm based on local-energy of samples. On the other hand, in order to improve its stability, an ensemble version of local energy-based feature ranking is proposed based on the recognition that ensemble learning is very effective for stability improvement. Some experiments are conducted on real-world data sets to show the higher stability of ensemble results than the single one.
作者 季薇 李云
出处 《南京大学学报(自然科学版)》 CAS CSCD 北大核心 2012年第4期499-503,共5页 Journal of Nanjing University(Natural Science)
基金 国家自然科学基金(61073114) 江苏省高校自然科学基金(08KJB520008 09KJB510012) 南京邮电大学人才引进启动基金和攀登计划(NY209003 NY210010)
关键词 特征选择 能量学习 集成 feature selection, energy based model, ensemble
  • 相关文献

参考文献24

  • 1Forman G. An extensive empirical study of fea- ture selection metrics for text classification. Journal of Machine Learning Research, 2003, 3:1289-1305. 被引量:1
  • 2冯莉,李满春,李飞雪.基于遗传算法的遥感图像纹理特征选择[J].南京大学学报(自然科学版),2008,44(3):310-319. 被引量:15
  • 3Swets D L, Weng J J. Efficient content-based image retrieval using automatic feature selec tion. Proceedings of IEEE International Sympo slum on Computer Vision, Florida, USA, 1995,85-90. 被引量:1
  • 4Ananth J P, Vijilous M A L, Bharathi V S. Feature extraction and selection for image re- trieval. International Journal of Soft Compu- ting, 2008, 3(2) :84-87. 被引量:1
  • 5Abeel T, Helleputte T. Robust biomarker iden- tification for cancer diagnosis with ensemble fea- ture selection methods. Bioinformatics, 2010, 26(3) :392-398. 被引量:1
  • 6Inza I, Larranaga P, Blanco R, et al. Filter versus wrapper gene selection approaches in DNA microarray domains. Artificial Intelligence in Medicine, 2004, 31:91-103. 被引量:1
  • 7Lee W, Stolfo S J, Mok K W. Adaptive intru sion detection: A data mining approach. Artifi- cial Intelligence Review, 2000, 14 ( 6 ):533-567. 被引量:1
  • 8Liu H, Yu L. Toward integrating feature selec- tion algorithms for classification and clustering. IEEE Transactions on Knowledge and Data En gineering, 2005, 17(3) :1-12. 被引量:1
  • 9Guyon I, Elisseeff A. An introduction to varia- ble and feature selection. Journal of Machine Learning Research, 2003, 3:1157-1182. 被引量:1
  • 10Zhao Z. Spectral feature selection for mining ul trahigh dimensional data. Doctoral Dissertation. Arizona State University, 2010. 被引量:1

二级参考文献19

共引文献14

同被引文献32

  • 1Han J W, Kamber M, Jian P. Data mining: Concepts and techniques. San Francisco: Morgan Kaufmann, 2006, 800. 被引量:1
  • 2Tang J L, Alelyani S, Liu H. Feature selection for classification: A review. Florida: The Chemical Rubber Company Press, 2013, 33. 被引量:1
  • 3Li Y, Gao S Y, Chen S C. Ensemble feature weighting based on local learning and diversity. In: Proceedings of the 26th AAAI Conference on Artificial Intelligence. California: AAAI Press, 2012, 1019-1025. 被引量:1
  • 4Song Q B, Ni J J, Wang G G. A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Transactions on Knowledge and Data Enegineering, 2013, 25(1): 1-14. 被引量:1
  • 5Gu Q Q, Li Z H, Han J W. Generalized Fisher score for feature selection. In: Proceedings of the International Conference on Uncertainty of Artificial Intelligence. California: Morgan Kaufmann Publishers, 2011: 266-273. 被引量:1
  • 6Marko R S, Igor K. Theoretical and empirical analysis of ReliefF and RreliefF. Machine Learning, 2003, 53(1-2): 23-69. 被引量:1
  • 7Guyon I, Weston J, Barnhill S, et al. Gene selection for cancer classification using support vector machines. Machine Learning, 2002, 46(1-3): 389-422. 被引量:1
  • 8Liu H, Yu L. Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(4): 491-502. 被引量:1
  • 9Woznica A, Nguyen P, Kalousis A. Model mining for robust feature selection. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2012: 913-921. 被引量:1
  • 10Hoi S C H, Wang J L, Zhao P L, et al. Online feature selection for mining big data. In: Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications. New York: ACM Press, 2012: 93-100. 被引量:1

引证文献2

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部