期刊文献+

高维特征选择方法在近红外光谱分类中的应用 被引量:18

High dimensional feature selection in near infrared spectroscopy classification
下载PDF
导出
摘要 针对卷烟近红外光谱高噪和高冗余特点,提出了一种基于随机森林(RF)和主成分分析(PCA)的特征优选方法 RF-PCA,建立了5种不同质量级别卷烟的分类模型,并和其他方法进行了比较。该方法能够有效地对高维数据样本进行分类,用于甄别卷烟品质真伪。特征选择可以过滤与分类不相关的特征,而通过PCA方法可以消除冗余特征的不良影响,并可进一步降低特征维数。实验表明:RF-PCA方法能有效地剔除近红外光谱数据中的噪声特征和冗余特征,提高了分类效率。 With regard to the large number of irrelevant and redundant features exist in the near infrared spectra, a novel feature selection method based on random forest and principal component analysis (RF- PCA) was proposed in this paper. By using the RF-PCA, a classification model of cigarettes qualitative evaluation was developed and also compared with other methods. The result shows that RF-PCA effectively classifies the samples of high dimensional data and can be used to evaluate quality and authenticity of the cigarettes. RF feature selection removes irrelevant features of the classification, while PCA further eliminates the influence of redundant features and also reduces the feature dimensionalities. The experiments show that RF-PCA effectively removes noise and redundant features in the NIR spectra and the classification accuracy is improved as well.
出处 《红外与激光工程》 EI CSCD 北大核心 2013年第5期1355-1359,共5页 Infrared and Laser Engineering
基金 科技部创新基金(06C26213710334)
关键词 近红外光谱 特征选择 随机森林 主成分分析 卷烟 NIR spectra feature selection RF PCA cigarettes
  • 相关文献

参考文献12

  • 1刘旭,陈华才,刘太昂,李银玲,陆治荣,陆文聪.PCA-SVR联用算法在近红外光谱分析烟草成分中的应用[J].光谱学与光谱分析,2007,27(12):2460-2463. 被引量:34
  • 2Hana M, Mcclure W F, Whitaker T B. Applying artificial neural networks II. Using near infrared data to classify tobacco types and identify native grown tobacco [J]. Journal of Near Infrared Spectroscopy, 1997, 5: 19-25. 被引量:1
  • 3唐雪梅,张薇,李慧.卷烟真伪鉴别的近红外定性分析方法[J].烟草科技,2008,41(11):5-8. 被引量:23
  • 4Bylesjo M, Rantalainen M, Nicholson J K, et al. K-OPLS package: Kernel-based orthogonal projections to latent structures for prediction and interpretation in feature space [J]. BMC Bioinformaties, 2008, 9(1): 106-112. 被引量:1
  • 5Boaz Nadler, Coifman Ronald R. The prediction error inCLS and PLS: the importance of feature selection prior to multivariate calibration [J]. Journal of Chemometrics, 2005, 19(2): 107-118. 被引量:1
  • 6Leo Breiman. Random forests [J]. Machine Learning, 2001, 45(1): 5-32. 被引量:1
  • 7Statnikov A, Wang L, Aliferis C F. A comprehensive comparison of random forests and support vector machines for microarray based cancer classification [J]. BMC Bioinformatics, 2008, 9: 319-323. 被引量:1
  • 8Menze B H, Petrich W, Hamprecht F A. Multivariate feature selection and hierarchical classification for infrared spectroscopy: serum-based detection of bovine spongiform encephalopathy [J]. Analytical and Bioanaytical Chemistry, 2007, 387(5): 1801-1807. 被引量:1
  • 9Efron B, Tibshirani R J. Bootstrap measures for standard errors, confidence interval and other measures of statistical accuracy[J]. Statistical Science, 1986, 1(1): 54-74. 被引量:1
  • 10Menze B H, Kelm B M, Masuch R, et al. A comparison of random forest ant its Gini importance with standard chemometric methods for the feature selection and classification of spectral data[J]. BMC Bioinformatics, 2009, 10:1-16. 被引量:1

二级参考文献19

共引文献51

同被引文献174

引证文献18

二级引证文献81

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部