摘要
半连续数据在经济和社会科学调查中普遍存在.在分析该类数据时,经典两部分回归模型经常被用来刻画协变量对响应变量可变性的影响.然而,包含协变量并不能完全解释响应变量的可变性.忽略未被观测的数据异质性将导致方差的剧烈波动.在本文中,我们将两部分回归模型推广到两部分因子分析模型.多变量半连续数据未观测的异质性由潜在因子部分来解释.此外,通过引入潜在性因子,多重变量间的相依性也以线性组合方式通过共享因子变量得到刻画.在贝叶斯框架内,我们运用马尔可夫链蒙特卡洛(MCMC)方法来进行后验分析.GIBBS采样器被用于从后验分布中抽取样本.基于模拟的随机样本,未知参数估计和模型评价等统计推断问题获得解决.随机模拟和可卡因使用数据分析等实证结果显示了该方法的有效性和实用性.
Semi-continuous data often occurs in the survey of economics and social sciences. In analyzing such data, the classic two-part regression model is a widely appreciated method to assess the effects of observed covariates on the variability of responses. However, inclusion of covariates does not explain the variability of responses totally. Neglecting the unobserved heterogeneity in data will lead to the drastic volatility in variances. In this paper, we extend the two-part regression model to the two-part factor analysis model. The unobserved heterogeneities of multivariate semi-continuous data are explained by the latent factors. Moreover, the dependence underlying the multiple items is also characterized via sharing the common factors in a manner of linear combinations. Within the Bayesian paradigm, Markov Chain Monte Carlo(MCMC) sampling is implemented to conduct the posterior analysis. Gibbs sampler is used to draw observations from the posterior distribution. Statistical inferences such as estimates of unknown parameters and model assessment are carried out based on these simulated observations. Empirical results including simulation study and cocaine use data analysis are presented to show the effectiveness and practical merits of the proposed methodology.
作者
夏业茂
凌耀斌
熊双粲
XIA Yemao;LING Yaobin;XIONG Shuangcan(Department of Applied Mathematics,Nanjing Forestry Univeristy,Nanjing 210037,China;School of Economics and Management,Nanjing Forestry University,Nanjing 210037,China)
出处
《应用数学》
CSCD
北大核心
2018年第4期761-778,共18页
Mathematica Applicata
基金
国家自然科学基金(11471161)
南京林业大学高学历人才计划项目(163101004)
江苏省高校基金(15KJB110010)