摘要
原始数据中的冗余特征和不相关特征会使得构建的学习模型复杂度提高,并对模型的性能有负面的影响.对此,提出一种基于Filter和Wrapper特征选择方法的两阶段式特征选择方法.首先以原始数据中特征的F-Score统计值为先验知识,然后结合序列前向搜索策略搜索优化的特征子集,搜索过程中依据分类算法的性能评价所选择的特征组合.采用十折交叉验证进行测试,并分别采用SVM、Logistic Regression、Adaboost分类模型进行对比实验,结果表明,算法能够有效地降低特征维数,并进一步提升算法的性能.
The redundant features and irrelevant features in the raw dataset not only improve the complexity of the learning model,but have negative impact on the performance of the model.A two-stage feature selection method based on Filter and Wrapper feature selection was proposed.First,the F-Score statistical characteristics of raw data were used as a prior knowledge,then combined with the sequence forward search strategy to search the optimal feature subset,and the feature subset was evaluated according to the performance of the classification algorithm in the search process.The proposed algorithm was tested by ten-fold cross-validation technique,and SVM,Logistic Regression,Adaboost classification model were adopted for comparative experiment.Experiment results show that the algorithm can effectively reduce the feature dimension,and further enhance the performance of the algorithm.
作者
秦彩杰
管强
QIN Caijie;GUAN Qiang(College of Information Engineering,Sanming University,Sanming,Fujian 365004,China)
出处
《宜宾学院学报》
2018年第6期4-8,共5页
Journal of Yibin University
基金
国家自然科学基金项目(11401341)
福建省自然科学基金项目(2017J01779)