In this article, we study the variable selection of partially linear single-index model(PLSIM). Based on the minimized average variance estimation, the variable selection of PLSIM is done by minimizing average varianc...In this article, we study the variable selection of partially linear single-index model(PLSIM). Based on the minimized average variance estimation, the variable selection of PLSIM is done by minimizing average variance with adaptive l1 penalty. Implementation algorithm is given. Under some regular conditions, we demonstrate the oracle properties of aLASSO procedure for PLSIM. Simulations are used to investigate the effectiveness of the proposed method for variable selection of PLSIM.展开更多
Nowadays a common problem when processing data sets with the large number of covariates compared to small sample sizes (fat data sets) is to estimate the parameters associated with each covariate. When the number of c...Nowadays a common problem when processing data sets with the large number of covariates compared to small sample sizes (fat data sets) is to estimate the parameters associated with each covariate. When the number of covariates far exceeds the number of samples, the parameter estimation becomes very difficult. Researchers in many fields such as text categorization deal with the burden of finding and estimating important covariates without overfitting the model. In this study, we developed a Sparse Probit Bayesian Model (SPBM) based on Gibbs sampling which utilizes double exponentials prior to induce shrinkage and reduce the number of covariates in the model. The method was evaluated using ten domains such as mathematics, the corpuses of which were downloaded from Wikipedia. From the downloaded corpuses, we created the TFIDF matrix corresponding to all domains and divided the whole data set randomly into training and testing groups of size 300. To make the model more robust we performed 50 re-samplings on selection of training and test groups. The model was implemented in R and the Gibbs sampler ran for 60 k iterations and the first 20 k was discarded as burn in. We performed classification on training and test groups by calculating P (yi = 1) and according to [1] [2] the threshold of 0.5 was used as decision rule. Our model’s performance was compared to Support Vector Machines (SVM) using average sensitivity and specificity across 50 runs. The SPBM achieved high classification accuracy and outperformed SVM in almost all domains analyzed.展开更多
本文受到SICA(smooth integration of counting and absolute deviation)方法的启发,提出一族基于反正切函数的非凸罚函数Arctan LASSO(Arctangent least absolute shrinkage and selection operator),该罚函数可以进行参数估计和变量选...本文受到SICA(smooth integration of counting and absolute deviation)方法的启发,提出一族基于反正切函数的非凸罚函数Arctan LASSO(Arctangent least absolute shrinkage and selection operator),该罚函数可以进行参数估计和变量选取,而且提供了一种有效的平滑方法从L_0过渡到L_1罚函数,渐近性质表明Arctan LASSO估计量具有n^(1/2)相合性和oracle性质.本文结合LLA(local linear approximation)和坐标下降法给出一种有效的迭代算法,并且基于BIC(Bayesian information criterion)准则选出合适的正则化参数.模拟数据分析显示Arctan LASSO在估计精度和变量选取方面有较好的表现,估计效果类似于SICA,而且通常优于LASSO、SCAD(smoothly clipped absolute deviation)、MCP(minimax concave penalty)和自适应LASSO.该方法在实际数据中可以用于变量选取的研究,具有重要的实际意义.展开更多
本文提出复合最小化平均分位数损失估计方法 (composite minimizing average check loss estimation,CMACLE)用于实现部分线性单指标模型(partial linear single-index models,PLSIM)的复合分位数回归(composite quantile regression,CQ...本文提出复合最小化平均分位数损失估计方法 (composite minimizing average check loss estimation,CMACLE)用于实现部分线性单指标模型(partial linear single-index models,PLSIM)的复合分位数回归(composite quantile regression,CQR).首先基于高维核函数构造参数部分的复合分位数回归意义下的相合估计,在此相合估计的基础上,通过采用指标核函数进一步得到参数和非参数函数的可达最优收敛速度的估计,并建立所得估计的渐近正态性,比较PLSIM的CQR估计和最小平均方差估计(MAVE)的相对渐近效率.进一步地,本文提出CQR框架下PLSIM的变量选择方法,证明所提变量选择方法的oracle性质.随机模拟和实例分析验证了所提方法在有限样本时的表现,证实了所提方法的优良性.展开更多
文摘In this article, we study the variable selection of partially linear single-index model(PLSIM). Based on the minimized average variance estimation, the variable selection of PLSIM is done by minimizing average variance with adaptive l1 penalty. Implementation algorithm is given. Under some regular conditions, we demonstrate the oracle properties of aLASSO procedure for PLSIM. Simulations are used to investigate the effectiveness of the proposed method for variable selection of PLSIM.
文摘Nowadays a common problem when processing data sets with the large number of covariates compared to small sample sizes (fat data sets) is to estimate the parameters associated with each covariate. When the number of covariates far exceeds the number of samples, the parameter estimation becomes very difficult. Researchers in many fields such as text categorization deal with the burden of finding and estimating important covariates without overfitting the model. In this study, we developed a Sparse Probit Bayesian Model (SPBM) based on Gibbs sampling which utilizes double exponentials prior to induce shrinkage and reduce the number of covariates in the model. The method was evaluated using ten domains such as mathematics, the corpuses of which were downloaded from Wikipedia. From the downloaded corpuses, we created the TFIDF matrix corresponding to all domains and divided the whole data set randomly into training and testing groups of size 300. To make the model more robust we performed 50 re-samplings on selection of training and test groups. The model was implemented in R and the Gibbs sampler ran for 60 k iterations and the first 20 k was discarded as burn in. We performed classification on training and test groups by calculating P (yi = 1) and according to [1] [2] the threshold of 0.5 was used as decision rule. Our model’s performance was compared to Support Vector Machines (SVM) using average sensitivity and specificity across 50 runs. The SPBM achieved high classification accuracy and outperformed SVM in almost all domains analyzed.
文摘本文受到SICA(smooth integration of counting and absolute deviation)方法的启发,提出一族基于反正切函数的非凸罚函数Arctan LASSO(Arctangent least absolute shrinkage and selection operator),该罚函数可以进行参数估计和变量选取,而且提供了一种有效的平滑方法从L_0过渡到L_1罚函数,渐近性质表明Arctan LASSO估计量具有n^(1/2)相合性和oracle性质.本文结合LLA(local linear approximation)和坐标下降法给出一种有效的迭代算法,并且基于BIC(Bayesian information criterion)准则选出合适的正则化参数.模拟数据分析显示Arctan LASSO在估计精度和变量选取方面有较好的表现,估计效果类似于SICA,而且通常优于LASSO、SCAD(smoothly clipped absolute deviation)、MCP(minimax concave penalty)和自适应LASSO.该方法在实际数据中可以用于变量选取的研究,具有重要的实际意义.
文摘本文提出复合最小化平均分位数损失估计方法 (composite minimizing average check loss estimation,CMACLE)用于实现部分线性单指标模型(partial linear single-index models,PLSIM)的复合分位数回归(composite quantile regression,CQR).首先基于高维核函数构造参数部分的复合分位数回归意义下的相合估计,在此相合估计的基础上,通过采用指标核函数进一步得到参数和非参数函数的可达最优收敛速度的估计,并建立所得估计的渐近正态性,比较PLSIM的CQR估计和最小平均方差估计(MAVE)的相对渐近效率.进一步地,本文提出CQR框架下PLSIM的变量选择方法,证明所提变量选择方法的oracle性质.随机模拟和实例分析验证了所提方法在有限样本时的表现,证实了所提方法的优良性.