摘要
特征选择是指从初始特征全集中,依据既定规则筛选出特征子集的过程,是数据挖掘的重要预处理步骤。通过剔除冗余属性,以达到降低算法复杂度和提高算法性能的目的。针对离散值特征选择问题,提出了一种将粗糙集相对分类信息熵和粒子群算法相结合的特征选择方法,依托粒子群算法,以相对分类信息熵作为适应度函数,并与其他基于进化算法的特征选择方法进行了实验比较,实验结果表明本文提出的方法具有一定的优势。
Feature selection,an important step in data mining,is a process that selects a subset from an original feature set based on some criteria. Its purpose is to reduce the computational complexity of the learning algorithm and to improve the performance of data mining by removing irrelevant and redundant features. To deal with the problem of discrete values,a feature selection approach was proposed in this paper. It uses a particle swarm optimization algorithm to search the optimal feature subset. Further,it employs relative classification information entropy as a fitness function to measure the significance of the feature subset. Then,the proposed approach was compared with other evolutionary algorithm-based methods of feature selection. The experimental results confirm that the proposed approach outperforms genetic algorithm-based methods.
出处
《智能系统学报》
CSCD
北大核心
2017年第3期397-404,共8页
CAAI Transactions on Intelligent Systems
基金
国家自然科学基金项目(71371063)
河北省自然科学基金项目(F2017201026)
浙江省计算机科学与技术重中之重学科(浙江师范大学)资助项目
关键词
数据挖掘
特征选择
数据预处理
粗糙集
决策表
粒子群算法
信息熵
适应度函数
data mining
feature selection
data preprocessing
rough set
decision table
particle swarm optimization
information entropy
fitness function