An important problem with null hypothesis significance testing, as it is normally performed, is that it is uninformative to reject a point null hypothesis [1]. A way around this problem is to use range null hypotheses...An important problem with null hypothesis significance testing, as it is normally performed, is that it is uninformative to reject a point null hypothesis [1]. A way around this problem is to use range null hypotheses [2]. But the use of range null hypotheses also is problematic. Aside from the usual issues of whether null hypothesis significance tests can be justified at all, there is an issue that is specific to range null hypotheses. It is not straightforward how to calculate the probability of the data given a range null hypothesis. The traditional way is to use the single point that maximizes the obtained p-value. The Bayesian alternative is to propose a prior probability distribution and integrate across it. Because frequentists and Bayesians disagree about a variety of issues, especially those pertaining to whether it is permissible to assign probabilities to hypotheses, and what gets lost in the shuffle is that the two camps actually come to different answers for the probability of the data given a range null hypothesis. Because the probability of the data given the hypothesis is a precursor for both camps, for drawing conclusions about hypotheses, different values for this probability for the different camps is crucial but seldom acknowledged. The goal of the present article is to bring out the problem in a manner accessible to researchers without strong mathematical or statistical backgrounds.展开更多
Assessing canopy nitrogen content(CNC) and canopy carbon content(CCC) of maize by hyperspectral remote sensing data permits estimating cropland productivity, protecting farmland ecology, and investigating the nitrogen...Assessing canopy nitrogen content(CNC) and canopy carbon content(CCC) of maize by hyperspectral remote sensing data permits estimating cropland productivity, protecting farmland ecology, and investigating the nitrogen and carbon cycles in the atmosphere. This study aimed to assess maize CNC and CCC using canopy hyperspectral information and uninformative variable elimination(UVE). Vegetation indices(VIs) and wavelet functions were adopted for estimating CNC and CCC under varying water and nitrogen regimes. Linear, nonlinear, and partial least squares(PLS) regression models were fitted to VIs and wavelet functions to estimate CNC and CCC, and were evaluated for their prediction accuracy.UVE was used to eliminate uninformative variables, improve the prediction accuracy of the models, and simplify the PLS regression models(UVE-PLS). For estimating CNC and CCC, the normalized difference vegetation index(NDVI, based on red edge and NIR wavebands) yielded the highest correlation coefficients(r > 0.88). PLS regression models showed the lowest root mean square error(RMSE) among all models. However, PLS regression models required nine VIs and four wavelet functions, increasing their complexity. UVE was used to retain valid spectral parameters and optimize the PLS regression models.UVE-PLS regression models improved validation accuracy and resulted in more accurate CNC and CCC than the PLS regression models. Thus, canopy spectral reflectance integrated with UVE-PLS can accurately reflect maize leaf nitrogen and carbon status.展开更多
Consensus methods have presented promising tools for improving the reliability of quantitative models in near-infrared(NIR) spectroscopic analysis.A strategy for improving the performance of consensus methods in multi...Consensus methods have presented promising tools for improving the reliability of quantitative models in near-infrared(NIR) spectroscopic analysis.A strategy for improving the performance of consensus methods in multivariate calibration of NIR spectra is proposed.In the approach,a subset of non-collinear variables is generated using successive projections algorithm(SPA) for each variable in the reduced spectra by uninformative variables elimination(UVE).Then sub-models are built using the variable subsets and the calibration subsets determined by Monte Carlo(MC) re-sampling,and the sub-model that produces minimal error in cross validation is selected as a member model.With repetition of the MC re-sampling,a series of member models are built and a consensus model is achieved by averaging all the member models.Since member models are built with the best variable subset and the randomly selected calibration subset,both the quality and the diversity of the member models are insured for the consensus model.Two NIR spectral datasets of tobacco lamina are used to investigate the proposed method.The superiority of the method in both accuracy and reliability is demonstrated.展开更多
Variable selection is applied widely for visible-near infrared(Vis-NIR)spectroscopy analysis of internal quality in fruits.Different spectral variable selection methods were compared for online quantitative analysis o...Variable selection is applied widely for visible-near infrared(Vis-NIR)spectroscopy analysis of internal quality in fruits.Different spectral variable selection methods were compared for online quantitative analysis of soluble solids content(SSC)in navel oranges.Moving window partial least squares(MW-PLS),Monte Carlo uninformative variables elimination(MC-UVE)and wavelet transform(WT)combined with the MC-UVE method were used to select the spectral variables and develop the calibration models of online analysis of SSC in navel oranges.The performances of these methods were compared for modeling the Vis NIR data sets of navel orange samples.Results show that the WT-MC-UVE methods gave better calibration models with the higher correlation cofficient(r)of 0.89 and lower root mean square error of prediction(RMSEP)of 0.54 at 5 fruits per second.It concluded that Vis NIR spectroscopy coupled with WT-MC-UVE may be a fast and efective tool for online quantitative analysis of SSC in navel oranges.展开更多
【目的】筛选整粒小麦籽粒蛋白质的近红外特征光谱波段并建立优化模型,可实现快速、无损测定整粒小麦籽粒蛋白质含量,为田间便携式小麦籽粒蛋白质含量速测仪设计提供依据。【方法】2012—2013年以蛋白质含量有明显差异的8个冬小麦品种...【目的】筛选整粒小麦籽粒蛋白质的近红外特征光谱波段并建立优化模型,可实现快速、无损测定整粒小麦籽粒蛋白质含量,为田间便携式小麦籽粒蛋白质含量速测仪设计提供依据。【方法】2012—2013年以蛋白质含量有明显差异的8个冬小麦品种为试验品种,设置3个施氮量和2个灌溉量共6个处理,建立丰富的样本类型,共采集176个小麦籽粒光谱数据;将ASD Field Spec Pro光谱仪采集到的基于全反射下垫面的整粒小麦籽粒反射光谱通过公式A=log(1/R)转换为吸收光谱,对吸收光谱采用S-G平滑、多元散射校正和基线校正等方法进行预处理,以消除背景噪声,然后采用交叉验证偏最小二乘回归方法进行特征波段压缩;分析比较无信息变量剔除法(UVE)结合交叉验证偏最小二乘回归、连续投影算法(SPA)结合交叉验证偏最小二乘回归、UVE与SPA组合后结合交叉验证偏最小二乘回归、UVE与SPA组合后结合多元线性回归(MLR)及UVE与SPA组合后结合逐步多元线性回归(SMLR)等多种特征光谱筛选方法选出的蛋白质特征波段的优劣,并与凯氏定氮法测定的小麦籽粒蛋白质含量进行回归分析,构建并优选小麦籽粒蛋白质最佳预测模型。【结果】利用无信息变量剔除(UVE)方法可将与小麦籽粒蛋白质含量无关的信息变量剔除,把籽粒的原始光谱由1 621个波段压缩至717个,在保留了蛋白质信息的同时,实现了特征谱段的初次优选;对逐步多元线性回归(SMLR)、连续投影算法(SPA)、连续投影算法(SPA)+逐步多元线性回归(SMLR)及连续投影算法(SPA)+偏最小二乘回归(PLS)+交叉验证(CV)等特征波段优选算法比较发现,不同的方法获得的特征谱段有差异,构建的模型及精度也明显不同。对经过无信息变量剔除(UVE)法筛选光谱特征谱段,利用SPA消除光谱矩阵中波段共线性影响,再利用SMLR筛选出小麦籽粒蛋白质信息贡献最大的15个特�展开更多
本文通过讨论了无信息变量消除法(uninformative variables elimination,UVE)的原理,并用此算法对玉米的近红外光谱数据进行波长变量选择,再使用偏最小二乘法(partial least squares,PLS)建立模型。结果表明,与使用全谱数据建立的模型...本文通过讨论了无信息变量消除法(uninformative variables elimination,UVE)的原理,并用此算法对玉米的近红外光谱数据进行波长变量选择,再使用偏最小二乘法(partial least squares,PLS)建立模型。结果表明,与使用全谱数据建立的模型相比较,筛选变量后建立的校正模型不仅简化了,而且增强了预测能力。展开更多
对葡萄酒酒精度偏最小二乘(Partial least squares,PLS)回归模型进行优化研究。使用近红外光谱仪采集葡萄酒样本的光谱数据,用于建立酒精度定量模型,实现在线快速检测。通过蒙特卡罗无信息变量消除(Monte Carlo uninformative variable ...对葡萄酒酒精度偏最小二乘(Partial least squares,PLS)回归模型进行优化研究。使用近红外光谱仪采集葡萄酒样本的光谱数据,用于建立酒精度定量模型,实现在线快速检测。通过蒙特卡罗无信息变量消除(Monte Carlo uninformative variable elimination,MC-UVE)和遗传算法(Genetic algorithm,GA)进行变量选择,基于被选择的变量分别进行PLS和因子分析(Factor analysis,FA),建立回归模型。结果表明,MC-UVE-GA-FAR模型预测集相关系数(R2)为0.946,预测均方根误差(Root mean square error of prediction,RMSEP)为0.215,效果优于MC-UVE-GA-PLS模型。与基于全范围光谱所建PLS回归模型相比,模型效果有所提升,而且模型所选变量个数仅为6,极大地简化了模型。MC-UVE和GA算法与FA分析结合可以实现模型的优化。展开更多
文摘An important problem with null hypothesis significance testing, as it is normally performed, is that it is uninformative to reject a point null hypothesis [1]. A way around this problem is to use range null hypotheses [2]. But the use of range null hypotheses also is problematic. Aside from the usual issues of whether null hypothesis significance tests can be justified at all, there is an issue that is specific to range null hypotheses. It is not straightforward how to calculate the probability of the data given a range null hypothesis. The traditional way is to use the single point that maximizes the obtained p-value. The Bayesian alternative is to propose a prior probability distribution and integrate across it. Because frequentists and Bayesians disagree about a variety of issues, especially those pertaining to whether it is permissible to assign probabilities to hypotheses, and what gets lost in the shuffle is that the two camps actually come to different answers for the probability of the data given a range null hypothesis. Because the probability of the data given the hypothesis is a precursor for both camps, for drawing conclusions about hypotheses, different values for this probability for the different camps is crucial but seldom acknowledged. The goal of the present article is to bring out the problem in a manner accessible to researchers without strong mathematical or statistical backgrounds.
基金supported by the National Key Research and Development Program of China (2016YFD0300602)China Agricultural Research System (CARS-04-PS19)Chengdu Science and Technology Project (2020-YF09-00033-SN)。
文摘Assessing canopy nitrogen content(CNC) and canopy carbon content(CCC) of maize by hyperspectral remote sensing data permits estimating cropland productivity, protecting farmland ecology, and investigating the nitrogen and carbon cycles in the atmosphere. This study aimed to assess maize CNC and CCC using canopy hyperspectral information and uninformative variable elimination(UVE). Vegetation indices(VIs) and wavelet functions were adopted for estimating CNC and CCC under varying water and nitrogen regimes. Linear, nonlinear, and partial least squares(PLS) regression models were fitted to VIs and wavelet functions to estimate CNC and CCC, and were evaluated for their prediction accuracy.UVE was used to eliminate uninformative variables, improve the prediction accuracy of the models, and simplify the PLS regression models(UVE-PLS). For estimating CNC and CCC, the normalized difference vegetation index(NDVI, based on red edge and NIR wavebands) yielded the highest correlation coefficients(r > 0.88). PLS regression models showed the lowest root mean square error(RMSE) among all models. However, PLS regression models required nine VIs and four wavelet functions, increasing their complexity. UVE was used to retain valid spectral parameters and optimize the PLS regression models.UVE-PLS regression models improved validation accuracy and resulted in more accurate CNC and CCC than the PLS regression models. Thus, canopy spectral reflectance integrated with UVE-PLS can accurately reflect maize leaf nitrogen and carbon status.
基金supported by the National Natural Science Foundation of China (20835002)
文摘Consensus methods have presented promising tools for improving the reliability of quantitative models in near-infrared(NIR) spectroscopic analysis.A strategy for improving the performance of consensus methods in multivariate calibration of NIR spectra is proposed.In the approach,a subset of non-collinear variables is generated using successive projections algorithm(SPA) for each variable in the reduced spectra by uninformative variables elimination(UVE).Then sub-models are built using the variable subsets and the calibration subsets determined by Monte Carlo(MC) re-sampling,and the sub-model that produces minimal error in cross validation is selected as a member model.With repetition of the MC re-sampling,a series of member models are built and a consensus model is achieved by averaging all the member models.Since member models are built with the best variable subset and the randomly selected calibration subset,both the quality and the diversity of the member models are insured for the consensus model.Two NIR spectral datasets of tobacco lamina are used to investigate the proposed method.The superiority of the method in both accuracy and reliability is demonstrated.
基金support provided by National Natural Science Foundation of China (60844007,61178036,21265006)National Science and Technology Support Plan (2008BAD96B04)+1 种基金Special Science and Technology Support Program for Foreign Science and Technology Cooperation Plan (2009BHB15200)Technological expertise and academic leaders training plan of Jiangxi Province (2009DD00700)。
文摘Variable selection is applied widely for visible-near infrared(Vis-NIR)spectroscopy analysis of internal quality in fruits.Different spectral variable selection methods were compared for online quantitative analysis of soluble solids content(SSC)in navel oranges.Moving window partial least squares(MW-PLS),Monte Carlo uninformative variables elimination(MC-UVE)and wavelet transform(WT)combined with the MC-UVE method were used to select the spectral variables and develop the calibration models of online analysis of SSC in navel oranges.The performances of these methods were compared for modeling the Vis NIR data sets of navel orange samples.Results show that the WT-MC-UVE methods gave better calibration models with the higher correlation cofficient(r)of 0.89 and lower root mean square error of prediction(RMSEP)of 0.54 at 5 fruits per second.It concluded that Vis NIR spectroscopy coupled with WT-MC-UVE may be a fast and efective tool for online quantitative analysis of SSC in navel oranges.
文摘【目的】筛选整粒小麦籽粒蛋白质的近红外特征光谱波段并建立优化模型,可实现快速、无损测定整粒小麦籽粒蛋白质含量,为田间便携式小麦籽粒蛋白质含量速测仪设计提供依据。【方法】2012—2013年以蛋白质含量有明显差异的8个冬小麦品种为试验品种,设置3个施氮量和2个灌溉量共6个处理,建立丰富的样本类型,共采集176个小麦籽粒光谱数据;将ASD Field Spec Pro光谱仪采集到的基于全反射下垫面的整粒小麦籽粒反射光谱通过公式A=log(1/R)转换为吸收光谱,对吸收光谱采用S-G平滑、多元散射校正和基线校正等方法进行预处理,以消除背景噪声,然后采用交叉验证偏最小二乘回归方法进行特征波段压缩;分析比较无信息变量剔除法(UVE)结合交叉验证偏最小二乘回归、连续投影算法(SPA)结合交叉验证偏最小二乘回归、UVE与SPA组合后结合交叉验证偏最小二乘回归、UVE与SPA组合后结合多元线性回归(MLR)及UVE与SPA组合后结合逐步多元线性回归(SMLR)等多种特征光谱筛选方法选出的蛋白质特征波段的优劣,并与凯氏定氮法测定的小麦籽粒蛋白质含量进行回归分析,构建并优选小麦籽粒蛋白质最佳预测模型。【结果】利用无信息变量剔除(UVE)方法可将与小麦籽粒蛋白质含量无关的信息变量剔除,把籽粒的原始光谱由1 621个波段压缩至717个,在保留了蛋白质信息的同时,实现了特征谱段的初次优选;对逐步多元线性回归(SMLR)、连续投影算法(SPA)、连续投影算法(SPA)+逐步多元线性回归(SMLR)及连续投影算法(SPA)+偏最小二乘回归(PLS)+交叉验证(CV)等特征波段优选算法比较发现,不同的方法获得的特征谱段有差异,构建的模型及精度也明显不同。对经过无信息变量剔除(UVE)法筛选光谱特征谱段,利用SPA消除光谱矩阵中波段共线性影响,再利用SMLR筛选出小麦籽粒蛋白质信息贡献最大的15个特�
文摘本文通过讨论了无信息变量消除法(uninformative variables elimination,UVE)的原理,并用此算法对玉米的近红外光谱数据进行波长变量选择,再使用偏最小二乘法(partial least squares,PLS)建立模型。结果表明,与使用全谱数据建立的模型相比较,筛选变量后建立的校正模型不仅简化了,而且增强了预测能力。
文摘对葡萄酒酒精度偏最小二乘(Partial least squares,PLS)回归模型进行优化研究。使用近红外光谱仪采集葡萄酒样本的光谱数据,用于建立酒精度定量模型,实现在线快速检测。通过蒙特卡罗无信息变量消除(Monte Carlo uninformative variable elimination,MC-UVE)和遗传算法(Genetic algorithm,GA)进行变量选择,基于被选择的变量分别进行PLS和因子分析(Factor analysis,FA),建立回归模型。结果表明,MC-UVE-GA-FAR模型预测集相关系数(R2)为0.946,预测均方根误差(Root mean square error of prediction,RMSEP)为0.215,效果优于MC-UVE-GA-PLS模型。与基于全范围光谱所建PLS回归模型相比,模型效果有所提升,而且模型所选变量个数仅为6,极大地简化了模型。MC-UVE和GA算法与FA分析结合可以实现模型的优化。