摘要
在粗糙集方法中,利用向前启发式算法进行特征选择,是一个逐步加入重要度最高的特征的过程,直至满足所给定的约束条件。但使用这一策略选择出来的特征子集有可能产生过拟合现象。鉴于此,设计了一种新的启发式算法,其主要思想是借助交叉验证的方法对特征的重要度进行计算,当过拟合出现时,则采用截断式机制终止算法。使用邻域粗糙集模型,在UCI数据集上将启发式算法与所提算法进行对比分析,实验结果表明:所提算法能够有效地降低过拟合的程度;利用所提算法得到的特征子集能够带来更好的分类性能。
In rough set theory,forward heuristic algorithm selects the most important feature in the process of feature selection until the given constraint is satisfied. However,the feature subset selected by such strategy may bring us over-fitting. To solve this problem,a new heuristic algorithm is designed. The importance of the feature is obtained by cross validation and then the early stopping is employed to terminate the algorithm when over-fitting occurs. Based on the neighborhood rough set,the new method is compared with the heuristic algorithm over several UCI data sets. The experimental results show that:the proposed algorithm can effectively reduce the degree of over-fitting,and the feature subset obtained by the new algorithm may offer better classification performances.
作者
张文冬
亓慧
刘克宇
杨习贝
ZHANG Wendong;QI Hui;LIU Keyu;YANG Xibei(School of Computer,Jiangsu University of Science and Technology,Zhenjiang,212003,China;Computer Science and Technology Department,Taiyuan Normal University,Taiyuan,030619,China)
出处
《南京航空航天大学学报》
EI
CAS
CSCD
北大核心
2019年第5期687-692,共6页
Journal of Nanjing University of Aeronautics & Astronautics
基金
国家自然科学基金(61572242,61502211,61503160)项目资助
关键词
特征选择
启发式算法
邻域粗糙集
过拟合
feature selection
heuristic algorithm
neighborhood rough set
over-fitting