Adaptive fractional polynomial modeling of general correlated outcomes is formulated to address nonlinearity in means, variances/dispersions, and correlations. Means and variances/dispersions are modeled using general...Adaptive fractional polynomial modeling of general correlated outcomes is formulated to address nonlinearity in means, variances/dispersions, and correlations. Means and variances/dispersions are modeled using generalized linear models in fixed effects/coefficients. Correlations are modeled using random effects/coefficients. Nonlinearity is addressed using power transforms of primary (untransformed) predictors. Parameter estimation is based on extended linear mixed modeling generalizing both generalized estimating equations and linear mixed modeling. Models are evaluated using likelihood cross-validation (LCV) scores and are generated adaptively using a heuristic search controlled by LCV scores. Cases covered include linear, Poisson, logistic, exponential, and discrete regression of correlated continuous, count/rate, dichotomous, positive continuous, and discrete numeric outcomes treated as normally, Poisson, Bernoulli, exponentially, and discrete numerically distributed, respectively. Example analyses are also generated for these five cases to compare adaptive random effects/coefficients modeling of correlated outcomes to previously developed adaptive modeling based on directly specified covariance structures. Adaptive random effects/coefficients modeling substantially outperforms direct covariance modeling in the linear, exponential, and discrete regression example analyses. It generates equivalent results in the logistic regression example analyses and it is substantially outperformed in the Poisson regression case. Random effects/coefficients modeling of correlated outcomes can provide substantial improvements in model selection compared to directly specified covariance modeling. However, directly specified covariance modeling can generate competitive or substantially better results in some cases while usually requiring less computation time.展开更多
In robust regression we often have to decide how many are the unusualobservations, which should be removed from the sample in order to obtain better fitting for the restof the observations. Generally, we use the basic...In robust regression we often have to decide how many are the unusualobservations, which should be removed from the sample in order to obtain better fitting for the restof the observations. Generally, we use the basic principle of LTS, which is to fit the majority ofthe data, identifying as outliers those points that cause the biggest damage to the robust fit.However, in the LTS regression method the choice of default values for high break down-point affectsseriously the efficiency of the estimator. In the proposed approach we introduce penalty cost fordiscarding an outlier, consequently, the best fit for the majority of the data is obtained bydiscarding only catastrophic observations. This penalty cost is based on robust design weights andhigh break down-point residual scale taken from the LTS estimator. The robust estimation is obtainedby solving a convex quadratic mixed integer programming problem, where in the objective functionthe sum of the squared residuals and penalties for discarding observations is minimized. Theproposed mathematical programming formula is suitable for small-sample data. Moreover, we conduct asimulation study to compare other robust estimators with our approach in terms of their efficiencyand robustness.展开更多
We present a methodology for constructing a short-term event risk score in heart failure patients from an ensemble predictor, using bootstrap samples, two different classification rules, logistic regression and linear...We present a methodology for constructing a short-term event risk score in heart failure patients from an ensemble predictor, using bootstrap samples, two different classification rules, logistic regression and linear discriminant analysis for mixed data, continuous or categorical, and random selection of explanatory variables to build individual predictors. We define a measure of the importance of each variable in the score and an event risk measure by an odds-ratio. Moreover, we establish a property of linear discriminant analysis for mixed data. This methodology is applied to EPHESUS trial patients on whom biological, clinical and medical history variables were measured.展开更多
Purpose: To formulate and demonstrate methods for regression modeling of probabilities and dispersions for individual-patient longitudinal outcomes taking on discrete numeric values. Methods: Three alternatives for mo...Purpose: To formulate and demonstrate methods for regression modeling of probabilities and dispersions for individual-patient longitudinal outcomes taking on discrete numeric values. Methods: Three alternatives for modeling of outcome probabilities are considered. Multinomial probabilities are based on different intercepts and slopes for probabilities of different outcome values. Ordinal probabilities are based on different intercepts and the same slope for probabilities of different outcome values. Censored Poisson probabilities are based on the same intercept and slope for probabilities of different outcome values. Parameters are estimated with extended linear mixed modeling maximizing a likelihood-like function based on the multivariate normal density that accounts for within-patient correlation. Formulas are provided for gradient vectors and Hessian matrices for estimating model parameters. The likelihood-like function is also used to compute cross-validation scores for alternative models and to control an adaptive modeling process for identifying possibly nonlinear functional relationships in predictors for probabilities and dispersions. Example analyses are provided of daily pain ratings for a cancer patient over a period of 97 days. Results: The censored Poisson approach is preferable for modeling these data, and presumably other data sets of this kind, because it generates a competitive model with fewer parameters in less time than the other two approaches. The generated probabilities for this model are distinctly nonlinear in time while the dispersions are distinctly nonconstant over time, demonstrating the need for adaptive modeling of such data. The analyses also address the dependence of these daily pain ratings on time and the daily numbers of pain flares. Probabilities and dispersions change differently over time for different numbers of pain flares. Conclusions: Adaptive modeling of daily pain ratings for individual cancer patients is an effective way to identify nonlinear relationships in time as 展开更多
Spatial models are effective in obtaining local details on grassland biomass,and their accuracy has important practical significance for the stable management of grasses and livestock.To this end,the present study uti...Spatial models are effective in obtaining local details on grassland biomass,and their accuracy has important practical significance for the stable management of grasses and livestock.To this end,the present study utilized measured quadrat data of grass yield across different regions in the main growing season of temperate grasslands in Ningxia of China(August 2020),combined with hydrometeorology,elevation,net primary productivity(NPP),and other auxiliary data over the same period.Accordingly,non-stationary characteristics of the spatial scale,and the effects of influencing factors on grass yield were analyzed using a mixed geographically weighted regression(MGWR)model.The results showed that the model was suitable for correlation analysis.The spatial scale of ratio resident-area index(PRI)was the largest,followed by the digital elevation model,NPP,distance from gully,distance from river,average July rainfall,and daily temperature range;whereas the spatial scales of night light,distance from roads,and relative humidity(RH)were the most limited.All influencing factors maintained positive and negative effects on grass yield,save for the strictly negative effect of RH.The regression results revealed a multiscale differential spatial response regularity of different influencing factors on grass yield.Regression parameters revealed that the results of Ordinary least squares(OLS)(Adjusted R^(2)=0.642)and geographically weighted regression(GWR)(Adjusted R^(2)=0.797)models were worse than those of MGWR(Adjusted R^(2)=0.889)models.Based on the results of the RMSE and radius index,the simulation effect also was MGWR>GWR>OLS models.Ultimately,the MGWR model held the strongest prediction performance(R^(2)=0.8306).Spatially,the grass yield was high in the south and west,and low in the north and east of the study area.The results of this study provide a new technical support for rapid and accurate estimation of grassland yield to dynamically adjust grazing decision in the semi-arid loess hilly region.展开更多
文摘Adaptive fractional polynomial modeling of general correlated outcomes is formulated to address nonlinearity in means, variances/dispersions, and correlations. Means and variances/dispersions are modeled using generalized linear models in fixed effects/coefficients. Correlations are modeled using random effects/coefficients. Nonlinearity is addressed using power transforms of primary (untransformed) predictors. Parameter estimation is based on extended linear mixed modeling generalizing both generalized estimating equations and linear mixed modeling. Models are evaluated using likelihood cross-validation (LCV) scores and are generated adaptively using a heuristic search controlled by LCV scores. Cases covered include linear, Poisson, logistic, exponential, and discrete regression of correlated continuous, count/rate, dichotomous, positive continuous, and discrete numeric outcomes treated as normally, Poisson, Bernoulli, exponentially, and discrete numerically distributed, respectively. Example analyses are also generated for these five cases to compare adaptive random effects/coefficients modeling of correlated outcomes to previously developed adaptive modeling based on directly specified covariance structures. Adaptive random effects/coefficients modeling substantially outperforms direct covariance modeling in the linear, exponential, and discrete regression example analyses. It generates equivalent results in the logistic regression example analyses and it is substantially outperformed in the Poisson regression case. Random effects/coefficients modeling of correlated outcomes can provide substantial improvements in model selection compared to directly specified covariance modeling. However, directly specified covariance modeling can generate competitive or substantially better results in some cases while usually requiring less computation time.
文摘In robust regression we often have to decide how many are the unusualobservations, which should be removed from the sample in order to obtain better fitting for the restof the observations. Generally, we use the basic principle of LTS, which is to fit the majority ofthe data, identifying as outliers those points that cause the biggest damage to the robust fit.However, in the LTS regression method the choice of default values for high break down-point affectsseriously the efficiency of the estimator. In the proposed approach we introduce penalty cost fordiscarding an outlier, consequently, the best fit for the majority of the data is obtained bydiscarding only catastrophic observations. This penalty cost is based on robust design weights andhigh break down-point residual scale taken from the LTS estimator. The robust estimation is obtainedby solving a convex quadratic mixed integer programming problem, where in the objective functionthe sum of the squared residuals and penalties for discarding observations is minimized. Theproposed mathematical programming formula is suitable for small-sample data. Moreover, we conduct asimulation study to compare other robust estimators with our approach in terms of their efficiencyand robustness.
文摘We present a methodology for constructing a short-term event risk score in heart failure patients from an ensemble predictor, using bootstrap samples, two different classification rules, logistic regression and linear discriminant analysis for mixed data, continuous or categorical, and random selection of explanatory variables to build individual predictors. We define a measure of the importance of each variable in the score and an event risk measure by an odds-ratio. Moreover, we establish a property of linear discriminant analysis for mixed data. This methodology is applied to EPHESUS trial patients on whom biological, clinical and medical history variables were measured.
文摘Purpose: To formulate and demonstrate methods for regression modeling of probabilities and dispersions for individual-patient longitudinal outcomes taking on discrete numeric values. Methods: Three alternatives for modeling of outcome probabilities are considered. Multinomial probabilities are based on different intercepts and slopes for probabilities of different outcome values. Ordinal probabilities are based on different intercepts and the same slope for probabilities of different outcome values. Censored Poisson probabilities are based on the same intercept and slope for probabilities of different outcome values. Parameters are estimated with extended linear mixed modeling maximizing a likelihood-like function based on the multivariate normal density that accounts for within-patient correlation. Formulas are provided for gradient vectors and Hessian matrices for estimating model parameters. The likelihood-like function is also used to compute cross-validation scores for alternative models and to control an adaptive modeling process for identifying possibly nonlinear functional relationships in predictors for probabilities and dispersions. Example analyses are provided of daily pain ratings for a cancer patient over a period of 97 days. Results: The censored Poisson approach is preferable for modeling these data, and presumably other data sets of this kind, because it generates a competitive model with fewer parameters in less time than the other two approaches. The generated probabilities for this model are distinctly nonlinear in time while the dispersions are distinctly nonconstant over time, demonstrating the need for adaptive modeling of such data. The analyses also address the dependence of these daily pain ratings on time and the daily numbers of pain flares. Probabilities and dispersions change differently over time for different numbers of pain flares. Conclusions: Adaptive modeling of daily pain ratings for individual cancer patients is an effective way to identify nonlinear relationships in time as
文摘Spatial models are effective in obtaining local details on grassland biomass,and their accuracy has important practical significance for the stable management of grasses and livestock.To this end,the present study utilized measured quadrat data of grass yield across different regions in the main growing season of temperate grasslands in Ningxia of China(August 2020),combined with hydrometeorology,elevation,net primary productivity(NPP),and other auxiliary data over the same period.Accordingly,non-stationary characteristics of the spatial scale,and the effects of influencing factors on grass yield were analyzed using a mixed geographically weighted regression(MGWR)model.The results showed that the model was suitable for correlation analysis.The spatial scale of ratio resident-area index(PRI)was the largest,followed by the digital elevation model,NPP,distance from gully,distance from river,average July rainfall,and daily temperature range;whereas the spatial scales of night light,distance from roads,and relative humidity(RH)were the most limited.All influencing factors maintained positive and negative effects on grass yield,save for the strictly negative effect of RH.The regression results revealed a multiscale differential spatial response regularity of different influencing factors on grass yield.Regression parameters revealed that the results of Ordinary least squares(OLS)(Adjusted R^(2)=0.642)and geographically weighted regression(GWR)(Adjusted R^(2)=0.797)models were worse than those of MGWR(Adjusted R^(2)=0.889)models.Based on the results of the RMSE and radius index,the simulation effect also was MGWR>GWR>OLS models.Ultimately,the MGWR model held the strongest prediction performance(R^(2)=0.8306).Spatially,the grass yield was high in the south and west,and low in the north and east of the study area.The results of this study provide a new technical support for rapid and accurate estimation of grassland yield to dynamically adjust grazing decision in the semi-arid loess hilly region.