摘要
数据特征的质量会直接影响模型的准确度。在模式识别领域,特征降维技术一直受到研究者们的关注。随着大数据时代的到来,数据量巨增,数据维度不断升高。在处理高维数据时,传统的数据挖掘方法的性能降低甚至失效。实践表明,在数据分析前先对其特征进行降维是避免"维数灾难"的有效手段。降维技术在各领域被广泛应用,文中详细介绍了特征提取和特征选择两类不同的降维方法,并对其特点进行了比较。通过子集搜索策略和评价准则两个关键过程对特征选择中最具代表性的算法进行了总结和分析。最后从实际应用出发,探讨了特征降维技术值得关注的研究方向。
Quality of data characteristics directly impacts the accuracy of the model.In the field of pattern recognition,dimensionality reduction technique is always the focus of researchers.At the era of big data,massive data needs to be processed while the dimension of the data is rising.The performance of the traditional methods of data mining is degraded or losing efficiency for processing high dimensional data.Studies show that dimensionality reduction technology can be implemented to effectively avoid the"Curse of Dimensionality"in data analysis,thus it has wild application.This paper gave detailed description about two dimensionality reduction methods which are feature selection and feature extraction,in addition,a thoroughly comparison about the feature of these two methods was performed.Feature selection algorithm was summarized and analyzed by two key steps of algorithm,which are searching strategy and evaluation criterion.Finally,the direction for future research of the dimensionality reduction was discussed based on its practical application.
作者
黄铉
HUANG Xuan(School of Information Science and Technology,Southwest Jiaotong University, Chengdu 610031, Chin)
出处
《计算机科学》
CSCD
北大核心
2018年第B06期16-21,53,共7页
Computer Science
关键词
降维
特征选择
特征提取
研究进展
Dimensionality reduction
Feature selection
Feature extraction
Research progress