In statistics and machine learning communities, the last fifteen years have witnessed a surge of high-dimensional models backed by penalized methods and other state-of-the-art variable selection techniques.The high-di...In statistics and machine learning communities, the last fifteen years have witnessed a surge of high-dimensional models backed by penalized methods and other state-of-the-art variable selection techniques.The high-dimensional models we refer to differ from conventional models in that the number of all parameters p and number of significant parameters s are both allowed to grow with the sample size T. When the field-specific knowledge is preliminary and in view of recent and potential affluence of data from genetics, finance and on-line social networks, etc., such(s, T, p)-triply diverging models enjoy ultimate flexibility in terms of modeling, and they can be used as a data-guided first step of investigation. However, model selection consistency and other theoretical properties were addressed only for independent data, leaving time series largely uncovered. On a simple linear regression model endowed with a weakly dependent sequence, this paper applies a penalized least squares(PLS) approach. Under regularity conditions, we show sign consistency, derive finite sample bound with high probability for estimation error, and prove that PLS estimate is consistent in L_2 norm with rate (s log s/T)~1/2.展开更多
As two popularly used variable selection methods, the Dantzig selector and the LASSO have been proved asymptotically equivalent in some scenarios. However, it is not the case in general for linear models, as disclosed...As two popularly used variable selection methods, the Dantzig selector and the LASSO have been proved asymptotically equivalent in some scenarios. However, it is not the case in general for linear models, as disclosed in Gai, Zhu and Lin's paper in 2013. In this paper, it is further shown that generally the asymptotic equivalence is not true either for a general single-index model with random design of predictors. To achieve this goal, the authors systematically investigate necessary and sufficient conditions for the consistent model selection of the Dantzig selector. An adaptive Dantzig selector is also recommended for the cases where those conditions are not satisfied. Also, different from existing methods for linear models, no distributional assumption on error term is needed with a trade-off that more stringent condition on the predictor vector is assumed. A small scale simulation is conducted to examine the performances of the Dantzig selector and the adaptive Dantzig selector.展开更多
In this paper, we propose an information-theoretic-criterion-based modelselection procedure for log-linear model of contingency tables under multinomial sampling, andestablish the strong consistency of the method unde...In this paper, we propose an information-theoretic-criterion-based modelselection procedure for log-linear model of contingency tables under multinomial sampling, andestablish the strong consistency of the method under some mild conditions. An exponential bound ofmiss detection probability is also obtained. The selection procedure is modified so that it can beused in practice. Simulation shows that the modified method is valid. To avoid selecting the penaltycoefficient in the information criteria, an alternative selection procedure is given.展开更多
基金supported by Natural Science Foundation of USA (Grant Nos. DMS1206464 and DMS1613338)National Institutes of Health of USA (Grant Nos. R01GM072611, R01GM100474 and R01GM120507)
文摘In statistics and machine learning communities, the last fifteen years have witnessed a surge of high-dimensional models backed by penalized methods and other state-of-the-art variable selection techniques.The high-dimensional models we refer to differ from conventional models in that the number of all parameters p and number of significant parameters s are both allowed to grow with the sample size T. When the field-specific knowledge is preliminary and in view of recent and potential affluence of data from genetics, finance and on-line social networks, etc., such(s, T, p)-triply diverging models enjoy ultimate flexibility in terms of modeling, and they can be used as a data-guided first step of investigation. However, model selection consistency and other theoretical properties were addressed only for independent data, leaving time series largely uncovered. On a simple linear regression model endowed with a weakly dependent sequence, this paper applies a penalized least squares(PLS) approach. Under regularity conditions, we show sign consistency, derive finite sample bound with high probability for estimation error, and prove that PLS estimate is consistent in L_2 norm with rate (s log s/T)~1/2.
基金supported by the National Natural Science Foundation of China under Grant Nos.11501354,11201499,11301309 and 714732802015 Shanghai Young Faculty Training Program under Grant No.A1A-6119-15-003
文摘As two popularly used variable selection methods, the Dantzig selector and the LASSO have been proved asymptotically equivalent in some scenarios. However, it is not the case in general for linear models, as disclosed in Gai, Zhu and Lin's paper in 2013. In this paper, it is further shown that generally the asymptotic equivalence is not true either for a general single-index model with random design of predictors. To achieve this goal, the authors systematically investigate necessary and sufficient conditions for the consistent model selection of the Dantzig selector. An adaptive Dantzig selector is also recommended for the cases where those conditions are not satisfied. Also, different from existing methods for linear models, no distributional assumption on error term is needed with a trade-off that more stringent condition on the predictor vector is assumed. A small scale simulation is conducted to examine the performances of the Dantzig selector and the adaptive Dantzig selector.
基金This research is partially supported by National Natural Science Foundation of China (10171094),Ph.D. Program Foundation of Ministry of Education of China and Special Foundations of the Chinese Academy of SciencesUSTC.
文摘In this paper, we propose an information-theoretic-criterion-based modelselection procedure for log-linear model of contingency tables under multinomial sampling, andestablish the strong consistency of the method under some mild conditions. An exponential bound ofmiss detection probability is also obtained. The selection procedure is modified so that it can beused in practice. Simulation shows that the modified method is valid. To avoid selecting the penaltycoefficient in the information criteria, an alternative selection procedure is given.