摘要
特征选择是机器学习和数据挖掘领域的关键问题之一,而特征选择的稳定性也是目前的一个研究热点.基于能量学习模型,分析了基于局部能量的特征选择方法并根据集成特征选择的原理,对基于局部能量的特征排序结果进行集成,以提高算法的稳定性.在现实数据集上的实验结果表明集成特征选择可以有效提高算法的稳定性.
Feature selection is one of the key problems in machine learning and data mining to reduce the dimensionality of data, and the stability of feature selection is one of the current hot points. Stability is the insensitivity of the result of a feature selection algorithm to variations of the training set. This issue is particularly critical for applications where feature selection is used as a knowledge discovery tool for identifying characteristic markers to explain the observed phenomena. In the paper, on the one hand, a feature selection algorithm-Lmba is introduced in detail, and the evaluation criterion is deeply analyzed in terms of energy-based model. Lmba can be considered as one of feature ranking algorithm based on local-energy of samples. On the other hand, in order to improve its stability, an ensemble version of local energy-based feature ranking is proposed based on the recognition that ensemble learning is very effective for stability improvement. Some experiments are conducted on real-world data sets to show the higher stability of ensemble results than the single one.
出处
《南京大学学报(自然科学版)》
CAS
CSCD
北大核心
2012年第4期499-503,共5页
Journal of Nanjing University(Natural Science)
基金
国家自然科学基金(61073114)
江苏省高校自然科学基金(08KJB520008
09KJB510012)
南京邮电大学人才引进启动基金和攀登计划(NY209003
NY210010)
关键词
特征选择
能量学习
集成
feature selection, energy based model, ensemble