摘要
将高斯过程算法引入化学计量学领域,用于挖掘近红外光谱与被测物组分之间的复杂关系。为增加模型的稳健性,首先采用了蒙特卡罗交叉验证方法以去除异常样本,而后多元散射校正、平滑、导数等方法被用于模型的预处理。近红外光谱在经过无信息变量去除算法处理后,在保留有用信息的基础上大大缩减了波长点数,以这些特征波长点作为输入建立的分析模型更具有解释能力和稳健性。为验证算法的有效性,使用了一组公开的数据集,它包含了80个玉米样品的近红外光谱以及油、淀粉、蛋白质的含量值。GP回归算法被用于分析这三种组分的含量,所得模型的评价指标分别采用校正、校正集交叉验证均方根误差、预测均方根误差以及各自的相关系数。结果显示,模型的校正相关系数r达到0.99以上,预测时的相关系数r也在0.96以上,验证了该算法的有效性。
Gaussian process(GP) is applied in the present paper as a chemometric method to explore the complicated relationship between the near infrared(NIR) spectra and ingredients.After the outliers were detected by Monte Carlo cross validation(MCCV) method and removed from dataset,different preprocessing methods,such as multiplicative scatter correction(MSC),smoothing and derivate,were tried for the best performance of the models.Furthermore,uninformative variable elimination(UVE) was introduced as a variable selection technique and the characteristic wavelengths obtained were further employed as input for modeling.A public dataset with 80 NIR spectra of corn was introduced as an example for evaluating the new algorithm.The optimal models for oil,starch and protein were obtained by the GP regression method.The performance of the final models were evaluated according to the root mean square error of calibration(RMSEC),root mean square error of cross-validation(RMSECV),root mean square error of prediction(RMSEP) and correlation coefficient(r).The models give good calibration ability with r values above 0.99 and the prediction ability is also satisfactory with r values higher than 0.96.The overall results demonstrate that GP algorithm is an effective chemometric method and is promising for the NIR analysis.
出处
《光谱学与光谱分析》
SCIE
EI
CAS
CSCD
北大核心
2011年第6期1514-1517,共4页
Spectroscopy and Spectral Analysis
基金
国家自然科学基金项目(10972207)
浙江省科技计划项目(2008C23085)资助
关键词
高斯过程
近红外光谱
蒙特卡罗交叉验证
无信息变量去除
定量分析
Gaussian process
Near-infrared spectroscopy
Monte Carlo cross validation
Uninformative variable elimination
Quantitative analysis