摘要
针对高光谱数据量大、信息冗余严重的现象,应用稳定竞争性自适应重加权采样(sCARS)、连续投影算法(SPA)、遗传算法(GA)、迭代保留有效信息变量(IRIV)和稳定竞争性自适应重加权采样结合连续投影算法(sCARS-SPA),从全波段光谱数据中筛选特征变量,并利用全波段和特征波段建立偏最小二乘回归(PLSR)、支持向量机(SVM)和随机森林(RF)模型预测土壤有机质含量。结果表明, PLSR和SVM模型结合特征变量选择,不仅提高了模型运算效率,而且模型预测能力较全波段均有一定提高;RF模型采用特征变量建模,对模型精度的提高不是十分明显,但其构建模型的变量数量却显著减少,大大提高建模效率。RF模型精度优于SVM和PLSR模型,IRIV结合RF建立的土壤有机质含量预测模型,变量数仅63个,校准集和验证集模型决定系数(R2)分别为0.941和0.96,验证集相对分析误差(RPD)为4.8。与全波段建模相比,特征变量选择和回归方法相结合,在保证模型精度的同时,可有效提高建模效率。
In view of the large amount of soil hyperspectral data and obvious spectral information redundancy,this paper aims to compare prediction abilities of multiple feature variable selection methods for estimating soil organic matter.The stability competitive adaptive reweighted sampling(sCARS),successive projections algorithm(SPA),genetic algorithm(GA),iteratively retained information variables(IRIV),and sCARS-SPA are used to select the characteristic variables from full spectral data.Based on these characteristic bands and full spectral bands,partial least squares regression(PLSR),support vector machine(SVM),and random forest(RF)models are used to predict the soil organic matter content.The results show that the PLSR and SVM models combined with variable selection can not only improve the efficiency of the model,but also improve the model prediction ability over the full bands.The accuracy of RF model constructed with characteristic variables is not obviously improved,but the variable number in the construction model is significantly reduced and the modeling efficiency is greatly improved.Overall,the RF model’s accuracy is better than those of the SVM model and the PLSR model.The variable number of the prediction model from the combination of IRIV and RF is only 63,and the coefficients of determination(R2)from calibration set and validation set are respectively 0.941 and 0.96,and the relative deviation for the validation set RPD is 4.8,showing a very good prediction capacity.Compared to modeling based on the full bands,the combination of characteristic variable selection and regression methods can effectively improve the modeling efficiency while ensuring the accuracy of the model.
作者
李冠稳
高小红
肖能文
肖云飞
Li Guanwen;Gao Xiaohong;Xiao Nengwen;Xiao Yunfei(Qinghai Provincial Key Laboratory of Physical Geography and Environmental Process,School of Geography Sciences,Qinghai Normal University,Xining,Qinghai810008,China;Chinese Research Academy of Environmental Sciences,Beijing100012,China)
出处
《光学学报》
EI
CAS
CSCD
北大核心
2019年第9期361-371,共11页
Acta Optica Sinica
基金
国家自然科学基金(41550003)
关键词
光谱学
土壤有机质含量
特征变量选择
回归模型
spectroscopy
soil organic matter content
characteristic variable selection
regression model