期刊文献+

基于粗糙集特征选择的过拟合现象及应对策略 被引量:3

Over-Fitting and Its Countermeasure in Feature Selection Based on Rough Set
下载PDF
导出
摘要 在粗糙集方法中,利用向前启发式算法进行特征选择,是一个逐步加入重要度最高的特征的过程,直至满足所给定的约束条件。但使用这一策略选择出来的特征子集有可能产生过拟合现象。鉴于此,设计了一种新的启发式算法,其主要思想是借助交叉验证的方法对特征的重要度进行计算,当过拟合出现时,则采用截断式机制终止算法。使用邻域粗糙集模型,在UCI数据集上将启发式算法与所提算法进行对比分析,实验结果表明:所提算法能够有效地降低过拟合的程度;利用所提算法得到的特征子集能够带来更好的分类性能。 In rough set theory,forward heuristic algorithm selects the most important feature in the process of feature selection until the given constraint is satisfied. However,the feature subset selected by such strategy may bring us over-fitting. To solve this problem,a new heuristic algorithm is designed. The importance of the feature is obtained by cross validation and then the early stopping is employed to terminate the algorithm when over-fitting occurs. Based on the neighborhood rough set,the new method is compared with the heuristic algorithm over several UCI data sets. The experimental results show that:the proposed algorithm can effectively reduce the degree of over-fitting,and the feature subset obtained by the new algorithm may offer better classification performances.
作者 张文冬 亓慧 刘克宇 杨习贝 ZHANG Wendong;QI Hui;LIU Keyu;YANG Xibei(School of Computer,Jiangsu University of Science and Technology,Zhenjiang,212003,China;Computer Science and Technology Department,Taiyuan Normal University,Taiyuan,030619,China)
出处 《南京航空航天大学学报》 EI CAS CSCD 北大核心 2019年第5期687-692,共6页 Journal of Nanjing University of Aeronautics & Astronautics
基金 国家自然科学基金(61572242,61502211,61503160)项目资助
关键词 特征选择 启发式算法 邻域粗糙集 过拟合 feature selection heuristic algorithm neighborhood rough set over-fitting
  • 相关文献

参考文献5

二级参考文献40

  • 1商琳,万琼,姚望舒,王金根,陈世福.一种连续值属性约简方法ReCA[J].计算机研究与发展,2005,42(7):1217-1224. 被引量:6
  • 2杨明.一种基于改进差别矩阵的属性约简增量式更新算法[J].计算机学报,2007,30(5):815-822. 被引量:112
  • 3Pawiak Z. Rough sets[J]. International Journal of Computer and Information Sciences, 1982,11 (5) :341-356. 被引量:1
  • 4Luo G Z, Yang X B. Limited dominance-based rough set model and knowledge reductions in incomplete decision system[J]. Journal of Information Science and Engineering, 2010, 26 (6): 2199-2211. 被引量:1
  • 5Yang H L, Li S G, Wang S Y, et al. Bipolar fuzzy rough set mo- del on two different universes and its application[J]. Know ledge-Based Systems, 2012,35 : 94-101. 被引量:1
  • 6Chen J K, Li J J. An application of rough sets to graph theory [J]. Information Sciences, 2012,201 (19) : 114-127. 被引量:1
  • 7Celotto E, Ellero A, Ferretti P. Short-medium term tourist ser- vices demand forecasting with Rough Set Theory[J]. Procedia Economics and Finance, 2012,3 (12): 62-67. 被引量:1
  • 8Yeh C C, Lin F, Hsu C Y. A hybrid KMV model, random forests and rough set theory approach for credit rating[J]. Knowledge Based Systems, 2012,33(3) : 166-172. 被引量:1
  • 9Skowron A, Rauszer C. The discernibility matrices and functions in information systems[M]. Kluwer Academic Publishers, 1992 331-362. 被引量:1
  • 10Min F, He H P, Qian Y H, et al. Test-cossensitive attribute re- duction[J]. Information Sciences, 2011,181 (22) :4928-4942. 被引量:1

共引文献56

同被引文献18

引证文献3

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部