变量选择是统计建模的重要环节,选择合适的变量可以建立结构简单、预测精准的稳健模型。本文在logistic回归下提出了新的双层变量选择惩罚方法——adaptive Sparse Group Lasso(adSGL),其独特之处在于基于变量的分组结构进行筛选,实现...变量选择是统计建模的重要环节,选择合适的变量可以建立结构简单、预测精准的稳健模型。本文在logistic回归下提出了新的双层变量选择惩罚方法——adaptive Sparse Group Lasso(adSGL),其独特之处在于基于变量的分组结构进行筛选,实现了组内和组间双层选择。该方法的优点是对各单个系数和组系数采取不同程度的惩罚,避免了过度惩罚大系数,从而提高了模型的估计和预测精度。求解的难点是惩罚似然函数不是严格凸出的,因此本文基于组坐标下降法求解模型,并建立了调整参数的选取准则。模拟分析表明,对比现有代表性方法 Sparse Group Lasso、Group Lasso及Lasso,adSGL法不仅提高了双层选择精度,而且降低了模型误差。最后,本文将adSGL法应用于信用卡信用评分研究,与logistic回归相比,其具有更高的分类精度和稳健性。展开更多
The selection of fixed effects is studied in high-dimensional generalized linear mixed models(HDGLMMs)without parametric distributional assumptions except for some moment conditions.The iterative-proxy-based penalized...The selection of fixed effects is studied in high-dimensional generalized linear mixed models(HDGLMMs)without parametric distributional assumptions except for some moment conditions.The iterative-proxy-based penalized quasi-likelihood method(IPPQL)is proposed to select the important fixed effects where an iterative proxy matrix of the covariance matrix of the random effects is constructed and the penalized quasi-likelihood is adapted.We establish the model selection consistency with oracle properties even for dimensionality of non-polynomial(NP)order of sample size.Simulation studies show that the proposed procedure works well.Besides,a real data is also analyzed.展开更多
本文讨论了指数族广义部分线性单指数模型(Generalized Partially Linear Single Index Models,GPLSIM)的惩罚样条迭代估计,提出了基于惩罚似然和一组预先取定的单指数参数向量α的初始估计的迭代估计算法。另外本文还通过一组模拟数据...本文讨论了指数族广义部分线性单指数模型(Generalized Partially Linear Single Index Models,GPLSIM)的惩罚样条迭代估计,提出了基于惩罚似然和一组预先取定的单指数参数向量α的初始估计的迭代估计算法。另外本文还通过一组模拟数据的分析对所提出的迭代算法进行了验证。展开更多
The seamless-L_0(SELO) penalty is a smooth function that very closely resembles the L_0 penalty, which has been demonstrated theoretically and practically to be effective in nonconvex penalization for variable selecti...The seamless-L_0(SELO) penalty is a smooth function that very closely resembles the L_0 penalty, which has been demonstrated theoretically and practically to be effective in nonconvex penalization for variable selection. In this paper, the authors first generalize the SELO penalty to a class of penalties retaining good features of SELO, and then develop variable selection and parameter estimation in Cox models using the proposed generalized SELO(GSELO) penalized log partial likelihood(PPL) approach. The authors show that the GSELO-PPL procedure possesses the oracle property with a diverging number of predictors under certain mild, interpretable regularity conditions. The entire path of GSELO-PPL estimates can be efficiently computed through a smoothing quasi-Newton(SQN) with continuation algorithm. The authors propose a consistent modified BIC(MBIC) tuning parameter selector for GSELO-PPL, and show that under some regularity conditions, the GSELOPPL-MBIC procedure consistently identifies the true model. Simulation studies and real data analysis are conducted to evaluate the finite sample performance of the proposed method.展开更多
Linear mixed-effects models are widely used in analysis of longitudinal data. However, testing for zero-variance components of random effects has not been well-resolved in statistical literature, although some likelih...Linear mixed-effects models are widely used in analysis of longitudinal data. However, testing for zero-variance components of random effects has not been well-resolved in statistical literature, although some likelihood-based procedures have been proposed and studied. In this article, we propose a generalized p-value based method in coupling with fiducial inference to tackle this problem. The proposed method is also applied to test linearity of the nonparametric functions in additive models. We provide theoretical justifications and develop an implementation algorithm for the proposed method. We evaluate its finite-sample performance and compare it with that of the restricted likelihood ratio test via simulation experiments. We illustrate the proposed approach using an application from a nutritional study.展开更多
Variable selection is an important research topic in modern statistics, traditional variable selection methods can only select the mean model and(or) the variance model, and cannot be used to select the joint mean, va...Variable selection is an important research topic in modern statistics, traditional variable selection methods can only select the mean model and(or) the variance model, and cannot be used to select the joint mean, variance and skewness models. In this paper, the authors propose the joint location, scale and skewness models when the data set under consideration involves asymmetric outcomes,and consider the problem of variable selection for our proposed models. Based on an efficient unified penalized likelihood method, the consistency and the oracle property of the penalized estimators are established. The authors develop the variable selection procedure for the proposed joint models, which can efficiently simultaneously estimate and select important variables in location model, scale model and skewness model. Simulation studies and body mass index data analysis are presented to illustrate the proposed methods.展开更多
<div style="text-align:justify;"> With the high speed development of information technology, contemporary data from a variety of fields becomes extremely large. The number of features in many datasets ...<div style="text-align:justify;"> With the high speed development of information technology, contemporary data from a variety of fields becomes extremely large. The number of features in many datasets is well above the sample size and is called high dimensional data. In statistics, variable selection approaches are required to extract the efficacious information from high dimensional data. The most popular approach is to add a penalty function coupled with a tuning parameter to the log likelihood function, which is called penalized likelihood method. However, almost all of penalized likelihood approaches only consider noise accumulation and supurious correlation whereas ignoring the endogeneity which also appeared frequently in high dimensional space. In this paper, we explore the cause of endogeneity and its influence on penalized likelihood approaches. Simulations based on five classical pe-nalized approaches are provided to vindicate their inconsistency under endogeneity. The results show that the positive selection rate of all five approaches increased gradually but the false selection rate does not consistently decrease when endogenous variables exist, that is, they do not satisfy the selection consistency. </div>展开更多
The feature selection characterized by relatively small sample size and extremely high-dimensional feature space is common in many areas of contemporary statistics. The high dimensionality of the feature space causes ...The feature selection characterized by relatively small sample size and extremely high-dimensional feature space is common in many areas of contemporary statistics. The high dimensionality of the feature space causes serious difficulties: (i) the sample correlations between features become high even if the features are stochastically independent; (ii) the computation becomes intractable. These difficulties make conventional approaches either inapplicable or inefficient. The reduction of dimensionality of the feature space followed by low dimensional approaches appears the only feasible way to tackle the problem. Along this line, we develop in this article a tournament screening cum EBIC approach for feature selection with high dimensional feature space. The procedure of tournament screening mimics that of a tournament. It is shown theoretically that the tournament screening has the sure screening property, a necessary property which should be satisfied by any valid screening procedure. It is demonstrated by numerical studies that the tournament screening cum EBIC approach enjoys desirable properties such as having higher positive selection rate and lower false discovery rate than other approaches.展开更多
Various forms of penalized estimators with good statistical and computational properties have been proposed for variable selection respecting the grouping structure in the variables. The attractive properties of these...Various forms of penalized estimators with good statistical and computational properties have been proposed for variable selection respecting the grouping structure in the variables. The attractive properties of these shrinkage and selection estimators, however, depend critically on the choice of the tuning parameter.One method for choosing the tuning parameter is via information criteria, such as the Bayesian information criterion(BIC). In this paper, we consider the problem of consistent tuning parameter selection in high dimensional generalized linear regression with grouping structures. We extend the results of the extended regularized information criterion(ERIC) to group selection methods involving concave penalties and then investigate the selection consistency with diverging variables in each group. Moreover, we show that the ERIC-type selector enables consistent identi?cation of the true model and that the resulting estimator possesses the oracle property even when the number of group is much larger than the sample size. Simulations show that the ERIC-type selector can signi?cantly outperform the BIC and cross-validation selectors when choosing true grouped variables,and an empirical example is given to illustrate its use.展开更多
In a survival analysis context, we suggest a new method to estimate the piecewise constant hazard rate model. The method provides an automatic procedure to find the number and location of cut points and to estimate th...In a survival analysis context, we suggest a new method to estimate the piecewise constant hazard rate model. The method provides an automatic procedure to find the number and location of cut points and to estimate the hazard on each cut interval. Estimation is performed through a penalized likelihood using an adaptive ridge procedure. A bootstrap procedure is proposed in order to derive valid statistical inference taking both into account the variability of the estimate and the variability in the choice of the cut points. The new method is applied both to simulated data and to the Mayo Clinic trial on primary biliary cirrhosis. The algorithm implementation is seen to work well and to be of practical relevance.展开更多
文摘变量选择是统计建模的重要环节,选择合适的变量可以建立结构简单、预测精准的稳健模型。本文在logistic回归下提出了新的双层变量选择惩罚方法——adaptive Sparse Group Lasso(adSGL),其独特之处在于基于变量的分组结构进行筛选,实现了组内和组间双层选择。该方法的优点是对各单个系数和组系数采取不同程度的惩罚,避免了过度惩罚大系数,从而提高了模型的估计和预测精度。求解的难点是惩罚似然函数不是严格凸出的,因此本文基于组坐标下降法求解模型,并建立了调整参数的选取准则。模拟分析表明,对比现有代表性方法 Sparse Group Lasso、Group Lasso及Lasso,adSGL法不仅提高了双层选择精度,而且降低了模型误差。最后,本文将adSGL法应用于信用卡信用评分研究,与logistic回归相比,其具有更高的分类精度和稳健性。
基金Supported by National Natural Science Foundation of China(11501579)Fundamental Research Funds for the Central Universities,China University of Geosciences(Wuhan)(CUGW150809)
基金Supported by National Natural Science Foundation of China(Grant No.11671398)State Key Lab of Coal Resources and Safe Mining(China University of Mining and Technology)(Grant No.SKLCRSM16KFB03)the Fundamental Research Funds for the Central Universities in China(Grant No.2009QS02)。
文摘The selection of fixed effects is studied in high-dimensional generalized linear mixed models(HDGLMMs)without parametric distributional assumptions except for some moment conditions.The iterative-proxy-based penalized quasi-likelihood method(IPPQL)is proposed to select the important fixed effects where an iterative proxy matrix of the covariance matrix of the random effects is constructed and the penalized quasi-likelihood is adapted.We establish the model selection consistency with oracle properties even for dimensionality of non-polynomial(NP)order of sample size.Simulation studies show that the proposed procedure works well.Besides,a real data is also analyzed.
文摘本文讨论了指数族广义部分线性单指数模型(Generalized Partially Linear Single Index Models,GPLSIM)的惩罚样条迭代估计,提出了基于惩罚似然和一组预先取定的单指数参数向量α的初始估计的迭代估计算法。另外本文还通过一组模拟数据的分析对所提出的迭代算法进行了验证。
基金supported by the National Natural Science Foundation of China under Grant Nos.11801531,11501578,11501579,11701571,11871474 and 41572315the Fundamental Research Funds for the Central Universities under Grant No.CUGW150809
文摘The seamless-L_0(SELO) penalty is a smooth function that very closely resembles the L_0 penalty, which has been demonstrated theoretically and practically to be effective in nonconvex penalization for variable selection. In this paper, the authors first generalize the SELO penalty to a class of penalties retaining good features of SELO, and then develop variable selection and parameter estimation in Cox models using the proposed generalized SELO(GSELO) penalized log partial likelihood(PPL) approach. The authors show that the GSELO-PPL procedure possesses the oracle property with a diverging number of predictors under certain mild, interpretable regularity conditions. The entire path of GSELO-PPL estimates can be efficiently computed through a smoothing quasi-Newton(SQN) with continuation algorithm. The authors propose a consistent modified BIC(MBIC) tuning parameter selector for GSELO-PPL, and show that under some regularity conditions, the GSELOPPL-MBIC procedure consistently identifies the true model. Simulation studies and real data analysis are conducted to evaluate the finite sample performance of the proposed method.
基金supported by Shandong Provincial Natural Science Foundation of China(Grant No.ZR2014AM019)National Natural Science Foundation of China(Grant Nos.11171188 and 11529101)the Scientific Research Foundation for the Returned Overseas Chinese Scholars,State Education Ministry of China,and National Science Foundation of USA(Grant Nos.DMS-1418042 and DMS-1620898)
文摘Linear mixed-effects models are widely used in analysis of longitudinal data. However, testing for zero-variance components of random effects has not been well-resolved in statistical literature, although some likelihood-based procedures have been proposed and studied. In this article, we propose a generalized p-value based method in coupling with fiducial inference to tackle this problem. The proposed method is also applied to test linearity of the nonparametric functions in additive models. We provide theoretical justifications and develop an implementation algorithm for the proposed method. We evaluate its finite-sample performance and compare it with that of the restricted likelihood ratio test via simulation experiments. We illustrate the proposed approach using an application from a nutritional study.
基金supported by the National Natural Science Foundation of China under Grant Nos.11261025,11561075the Natural Science Foundation of Yunnan Province under Grant No.2016FB005the Program for Middle-aged Backbone Teacher,Yunnan University
文摘Variable selection is an important research topic in modern statistics, traditional variable selection methods can only select the mean model and(or) the variance model, and cannot be used to select the joint mean, variance and skewness models. In this paper, the authors propose the joint location, scale and skewness models when the data set under consideration involves asymmetric outcomes,and consider the problem of variable selection for our proposed models. Based on an efficient unified penalized likelihood method, the consistency and the oracle property of the penalized estimators are established. The authors develop the variable selection procedure for the proposed joint models, which can efficiently simultaneously estimate and select important variables in location model, scale model and skewness model. Simulation studies and body mass index data analysis are presented to illustrate the proposed methods.
文摘<div style="text-align:justify;"> With the high speed development of information technology, contemporary data from a variety of fields becomes extremely large. The number of features in many datasets is well above the sample size and is called high dimensional data. In statistics, variable selection approaches are required to extract the efficacious information from high dimensional data. The most popular approach is to add a penalty function coupled with a tuning parameter to the log likelihood function, which is called penalized likelihood method. However, almost all of penalized likelihood approaches only consider noise accumulation and supurious correlation whereas ignoring the endogeneity which also appeared frequently in high dimensional space. In this paper, we explore the cause of endogeneity and its influence on penalized likelihood approaches. Simulations based on five classical pe-nalized approaches are provided to vindicate their inconsistency under endogeneity. The results show that the positive selection rate of all five approaches increased gradually but the false selection rate does not consistently decrease when endogenous variables exist, that is, they do not satisfy the selection consistency. </div>
基金supported by Singapore Ministry of Educations ACRF Tier 1 (Grant No. R-155-000-065-112)supported by the National Science and Engineering Research Countil of Canada and MITACS,Canada
文摘The feature selection characterized by relatively small sample size and extremely high-dimensional feature space is common in many areas of contemporary statistics. The high dimensionality of the feature space causes serious difficulties: (i) the sample correlations between features become high even if the features are stochastically independent; (ii) the computation becomes intractable. These difficulties make conventional approaches either inapplicable or inefficient. The reduction of dimensionality of the feature space followed by low dimensional approaches appears the only feasible way to tackle the problem. Along this line, we develop in this article a tournament screening cum EBIC approach for feature selection with high dimensional feature space. The procedure of tournament screening mimics that of a tournament. It is shown theoretically that the tournament screening has the sure screening property, a necessary property which should be satisfied by any valid screening procedure. It is demonstrated by numerical studies that the tournament screening cum EBIC approach enjoys desirable properties such as having higher positive selection rate and lower false discovery rate than other approaches.
基金supported by National Natural Science Foundation of China (Grant Nos. 11571337 and 71631006)the Fundamental Research Funds for the Central Universities (Grant No. WK2040160028)
文摘Various forms of penalized estimators with good statistical and computational properties have been proposed for variable selection respecting the grouping structure in the variables. The attractive properties of these shrinkage and selection estimators, however, depend critically on the choice of the tuning parameter.One method for choosing the tuning parameter is via information criteria, such as the Bayesian information criterion(BIC). In this paper, we consider the problem of consistent tuning parameter selection in high dimensional generalized linear regression with grouping structures. We extend the results of the extended regularized information criterion(ERIC) to group selection methods involving concave penalties and then investigate the selection consistency with diverging variables in each group. Moreover, we show that the ERIC-type selector enables consistent identi?cation of the true model and that the resulting estimator possesses the oracle property even when the number of group is much larger than the sample size. Simulations show that the ERIC-type selector can signi?cantly outperform the BIC and cross-validation selectors when choosing true grouped variables,and an empirical example is given to illustrate its use.
文摘In a survival analysis context, we suggest a new method to estimate the piecewise constant hazard rate model. The method provides an automatic procedure to find the number and location of cut points and to estimate the hazard on each cut interval. Estimation is performed through a penalized likelihood using an adaptive ridge procedure. A bootstrap procedure is proposed in order to derive valid statistical inference taking both into account the variability of the estimate and the variability in the choice of the cut points. The new method is applied both to simulated data and to the Mayo Clinic trial on primary biliary cirrhosis. The algorithm implementation is seen to work well and to be of practical relevance.
基金supported by the National Natural Science Foundation of China(72071187,11671374,71731010,71921001)the Fundamental Research Funds for the Central Universities(WK3470000017,WK2040000027).