摘要
针对离散值数据集特征选择问题,提出基于相对分类信息熵的进化特征选择算法.使用遗传算法搜索最优特征子集,使用相对分类信息熵度量特征子集的重要性.以相对分类信息熵作为适应度函数,使用二进制编码问题的解,使用赌轮方法选择产生下一代个体.实验表明文中算法在测试精度上优于其它方法,此外还从理论上证明文中算法的可行性.
Aiming at the problem of feature selection from datasets with discrete values, a feature selection approach via evolutionary computation based on relative classification information entropy is proposed. Genetic algorithm is used to search the optimal feature subset and the relative classification information entropy is employed to measure the significance of the feature subset. Specifically, the relative classification information entropy is used as fitness function, the solutions of the problems are encoded with binary number, and the next generation of individuals is produced by using roulette wheel method. The experimental results show that the proposed approach outperforms other methods in testing accuracy.Furthermore, the proposed approach is theoretically proved to be feasible.
出处
《模式识别与人工智能》
EI
CSCD
北大核心
2016年第8期682-690,共9页
Pattern Recognition and Artificial Intelligence
基金
国家自然科学基金项目(No.71371063)
河北省自然科学基金项目(No.F2013201220)
河北省高等学校科学技术研究重点项目(No.ZD20131028)资助~~
关键词
特征选择
数据预处理
进化计算
遗传算法
信息熵
Feature Selection
Data Preprocessing
Evolutionary Computation
Genetic Algorithm
Information Entropy