In this paper, a zero-and-one-inflated Poisson (ZOIP) model is studied. The maximum likelihoodestimation and the Bayesian estimation of the model parameters are obtained based on dataaugmentation method. A simulation ...In this paper, a zero-and-one-inflated Poisson (ZOIP) model is studied. The maximum likelihoodestimation and the Bayesian estimation of the model parameters are obtained based on dataaugmentation method. A simulation study based on proposed sampling algorithm is conductedto assess the performance of the proposed estimation for various sample sizes. Finally, two realdata-sets are analysed to illustrate the practicability of the proposed method.展开更多
The occurrence of lightning-induced forest fires during a time period is count data featuring over-dispersion (i.e., variance is larger than mean) and a high frequency of zero counts. In this study, we used six gene...The occurrence of lightning-induced forest fires during a time period is count data featuring over-dispersion (i.e., variance is larger than mean) and a high frequency of zero counts. In this study, we used six generalized linear models to examine the relationship between the occurrence of lightning-induced forest fires and meteorological factors in the Northern Daxing'an Mountains of China. The six models included Poisson, negative binomial (NB), zero- inflated Poisson (ZIP), zero-inflated negative binomial (ZINB), Poisson hurdle (PH), and negative binomial hurdle (NBH) models. Goodness-of-fit was compared and tested among the six models using Akaike information criterion (AIC), sum of squared errors, likelihood ratio test, and Vuong test. The predictive performance of the models was assessed and compared using independent validation data by the data-splitting method. Based on the model AIC, the ZINB model best fitted the fire occurrence data, followed by (in order of smaller AIC) NBH, ZIP, NB, PH, and Poisson models. The ZINB model was also best for pre- dicting either zero counts or positive counts (〉1). The two Hurdle models (PH and NBH) were better than ZIP, Poisson, and NB models for predicting positive counts, but worse than these three models for predicting zero counts. Thus, the ZINB model was the first choice for modeling the occurrence of lightning-induced forest fires in this study, which implied that the excessive zero counts of lightning- induced fires came from both structure and sampling zeros.展开更多
The mortality of trees across diameter class model is a useful tool for predicting changes in stand structure.Mortality data commonly contain a large fraction of zeros and general discrete models thus show more errors...The mortality of trees across diameter class model is a useful tool for predicting changes in stand structure.Mortality data commonly contain a large fraction of zeros and general discrete models thus show more errors.Based on the traditional Poisson model and the negative binomial model,different forms of zero-inflated and hurdle models were applied to spruce-fir mixed forests data to simulate the number of dead trees.By comparing the residuals and Vuong test statistics,the zero-inflated negative binomial model performed best.A random effect was added to improve the model accuracy;however,the mixed-effects zero-inflated model did not show increased advantages.According to the model principle,the zeroinflated negative binomial model was the most suitable,indicating that the"0"events in this study,mainly from the sample"0",i.e.,the zero mortality data,are largely due to the limitations of the experimental design and sample selection.These results also show that the number of dead trees in the diameter class is positively correlated with the number of trees in that class and the mean stand diameter,and inversely related to class size,and slope and aspect of the site.展开更多
Many researchers have discussed zero-inflated univariate distributions. These univariate models are not suitable, for modeling events that involve different types of counts or defects. To model several types of defect...Many researchers have discussed zero-inflated univariate distributions. These univariate models are not suitable, for modeling events that involve different types of counts or defects. To model several types of defects, multivariate Poisson model is one of the appropriate model. This can further be modified to incorporate inflation at zero and we can have multivariate zero-inflated Poisson distribution. In the present article, we introduce a new Bivariate Zero Inflated Power Series Distribution and discuss inference related to the parameters involved in the model. We also discuss the inference related to Bivariate Zero Inflated Poisson Distribution. The model has been applied to a real life data. Extension to k-variate zero inflated power series distribution is also discussed.展开更多
A new three-parameter discrete distribution called the zero-inflated cosine geometric(ZICG)distribution is proposed for the first time herein.It can be used to analyze over-dispersed count data with excess zeros.The b...A new three-parameter discrete distribution called the zero-inflated cosine geometric(ZICG)distribution is proposed for the first time herein.It can be used to analyze over-dispersed count data with excess zeros.The basic statistical properties of the new distribution,such as the moment generating function,mean,and variance are presented.Furthermore,confidence intervals are constructed by using the Wald,Bayesian,and highest posterior density(HPD)methods to estimate the true confidence intervals for the parameters of the ZICG distribution.Their efficacies were investigated by using both simulation and real-world data comprising the number of daily COVID-19 positive cases at the Olympic Games in Tokyo 2020.The results show that the HPD interval performed better than the other methods in terms of coverage probability and average length in most cases studied.展开更多
In this paper,we discuss some important aspects of the bivariate alternative zero inflated log-arithmic series distribution(BAZILSD)of which the marginals are the alternative zero-inflated logarithmic series ditributi...In this paper,we discuss some important aspects of the bivariate alternative zero inflated log-arithmic series distribution(BAZILSD)of which the marginals are the alternative zero-inflated logarithmic series ditributions of Kumar and Riyaz(2015.An alternative version of zero-inflated logarithmic series distribution and some of its applications.Journal of Statistical Computation and Simulation,85(6),1117-1127).We study some important properties of the distribution by deriving expressions for its probability mass function,factorial moments,conditional probabil-ity generating functions,and recursion formulae for its probilities,raw moments and factorial moments.The parameters of the BAZILSD are estimated by the method of maximum likelihood and certain test procedures are also considered.Further certain real-life data applications are cited for ilustrating the usefulness of the model.A simulation study is conducted for assessing the performance of the maximum likelihood estimators of the parameters of the BAZILSD.展开更多
Background: The recently emerged technology of methylated RNA immunoprecipitation sequencing (MeRIP-seq) sheds light on the study of RNA epigenetics. This new bioinformatics question calls for effective and robust ...Background: The recently emerged technology of methylated RNA immunoprecipitation sequencing (MeRIP-seq) sheds light on the study of RNA epigenetics. This new bioinformatics question calls for effective and robust peaking calling algorithms to detect mRNA methylation sites from MeRIP-seq data. Methods: We propose a Bayesian hierarchical model to detect methylation sites from MeRIP-seq data. Our modeling approach includes several important characteristics. First, it models the zero-inflated and over-dispersed counts by deploying a zero-inflated negative binomial model. Second, it incorporates a hidden Markov model (HMM) to account for the spatial dependency of neighboring read enrichment. Third, our Bayesian inference allows the proposed model to borrow strength in parameter estimation, which greatly improves the model stability when dealing with MeRIP-seq data with a small number of replicates. We use Markov chain Monte Carlo (MCMC) algorithms to simultaneously infer the model parameters in a de novo fashion. The R Shiny demo is available at https://qiwei. shinyapps.io/BaySeqPeak and the R/C ++ code is available at https://github.com/liqiwei2000/BaySeqPeak. Results: In simulation studies, the proposed method outperformed the competing methods exomePeak and MeTPeak, especially when an excess of zeros were present in the data. In real MeRIP-seq data analysis, the proposed method identified methylation sites that were more consistent with biological knowledge, and had better spatial resolution compared to the other methods. Conclusions: In this study, we develop a Bayesian hierarchical model to identify methylation peaks in MeRIP-seq data. The proposed method has a competitive edge over existing methods in terms of accuracy, robustness and spatial resolution.展开更多
We focus on the COM-type negative binomial distribution with three parameters, which belongs to COM-type (a, b, 0) class distributions and family of equilibrium distributions of arbitrary birth-death process. Beside...We focus on the COM-type negative binomial distribution with three parameters, which belongs to COM-type (a, b, 0) class distributions and family of equilibrium distributions of arbitrary birth-death process. Besides, we show abundant distributional properties such as overdispersion and underdispersion, log-concavity, log-convexity (infinite divisibility), pseudo compound Poisson, stochastic ordering, and asymptotic approximation. Some characterizations including sum of equicorrelated geometrically distributed random variables, conditional distribution, limit distribution of COM-negative hypergeometric distribution, and Stein's identity are given for theoretical properties. COM- negative binomial distribution was applied to overdispersion and ultrahigh zeroinflated data sets. With the aid of ratio regression, we employ maximum likelihood method to estimate the parameters and the goodness-of-fit are evaluated by the discrete Kolmogorov-Smirnov test.展开更多
Objective Sub-health status has progressively gained more attention from both medical professionals and the publics. Treating the number of sub-health symptoms as count data rather than dichotomous data helps to compl...Objective Sub-health status has progressively gained more attention from both medical professionals and the publics. Treating the number of sub-health symptoms as count data rather than dichotomous data helps to completely and accurately analyze findings in sub-healthy population. This study aims to compare the goodness of fit for count outcome models to identify the optimum model for sub-health study.Methods The sample of the study derived from a large-scale population survey on physiological and psychological constants from 2007 to 2011 in 4 provinces and 2 autonomous regions in China. We constructed four count outcome models using SAS: Poisson model, negative binomial (NB) model, zero-inflated Poisson (ZIP) model and zero-inflated negative binomial (ZINB) model. The number of sub-health symptoms was used as the main outcome measure. The alpha dispersion parameter and O test were used to identify over-dispersed data, and Vuong test was used to evaluate the excessive zero count. The goodness of fit of regression models were determined by predictive probability curves and statistics of likelihood ratio test.Results Of all 78 307 respondents, 38.53% reported no sub-health symptoms. The mean number of sub-health symptoms was 2.98, and the standard deviation was 3.72. The statistic O in over-dispersion test was 720.995 (P<0.001); the estimated alpha was 0.618 (95% CI: 0.600-0.636) comparing ZINB model and ZIP model; Vuong test statistic Z was 45.487. These results indicated over-dispersion of the data and excessive zero counts in this sub-health study. ZINB model had the largest log likelihood (-167 519), the smallest Akaike’s Information Criterion coefficient (335 112) and the smallest Bayesian information criterion coefficient (335455),indicating its best goodness of fit. The predictive probabilities for most counts in ZINB model fitted the observed counts best. The logit section of ZINB model analysis showed that age, sex, occupation, smoking, alcohol drinking, ethnicity and obesity were determinants for presen展开更多
Objectives: This study empirically assesses the impact of the changes in women’s characteristics, empowerment, availability and quality of health services on woman’s decision to use antenatal care (ANC) and the freq...Objectives: This study empirically assesses the impact of the changes in women’s characteristics, empowerment, availability and quality of health services on woman’s decision to use antenatal care (ANC) and the frequency of that use during the period 2000-2008. Study Design: The study is a cross-sectional analytical study using 2000 and 2008 Egypt Demographic and Health Surveys. Methods: The assessment of the studied impact is conducted using the Zero-inflated Negative Binomial Regression. In addition, Factor Analysis technique is used to construct some of the explanatory variables such as women’s empowerment, the availability and quality of health services indicators. Results: Utilization of antenatal health care services is greatly improved from 2000 to 2008. Availability of health services is one of the main determinants that affect the number of antenatal care visits in 2008. Wealth index and quality of health services play an important role in raising the level of antenatal care utilization in 2000 and 2008. However, the impact of the terminated pregnancy on receiving ANC increased over time. Conclusions: Further research of the determinants of antenatal health care utilization is needed, using more updated measures of women’s empowerment, availability and quality of health services. In order to improve the provision of antenatal health care services, it is important to understand barriers to antenatal health care utilization. Therefore, it is advisable to collect information from women about the reasons for not receiving antenatal care.展开更多
基金The research is supported by the Natural Science Foundation of China(Nos.11271136,81530086,11671303,11201345)the 111 Project of China(No.B14019)+5 种基金the Natural Science Foundation of Zhejiang Province(No.LY15G010006)the China Postdoctoral Science Foundation(No.2015M572598)National Natural Science Foundation of China(CN)[grant number 11671303],[grant number 11201345]:Ministry of Education of the People’s Republic of China(CN)[grant number B14019]China Postdoctoral Science Foundation(CN)[grant number 2015M572598]National Natural Science Foundation of China(CN)[grant number 11271136],[grant number 81530086]Natural Science Foundation of Zhejiang Province(CN)[grant number LY15G010006].
文摘In this paper, a zero-and-one-inflated Poisson (ZOIP) model is studied. The maximum likelihoodestimation and the Bayesian estimation of the model parameters are obtained based on dataaugmentation method. A simulation study based on proposed sampling algorithm is conductedto assess the performance of the proposed estimation for various sample sizes. Finally, two realdata-sets are analysed to illustrate the practicability of the proposed method.
基金funded by Asia–Pacific Forests Net(APFNET/2010/FPF/001)National Natural Science Foundation of China(Grant No.31400552)
文摘The occurrence of lightning-induced forest fires during a time period is count data featuring over-dispersion (i.e., variance is larger than mean) and a high frequency of zero counts. In this study, we used six generalized linear models to examine the relationship between the occurrence of lightning-induced forest fires and meteorological factors in the Northern Daxing'an Mountains of China. The six models included Poisson, negative binomial (NB), zero- inflated Poisson (ZIP), zero-inflated negative binomial (ZINB), Poisson hurdle (PH), and negative binomial hurdle (NBH) models. Goodness-of-fit was compared and tested among the six models using Akaike information criterion (AIC), sum of squared errors, likelihood ratio test, and Vuong test. The predictive performance of the models was assessed and compared using independent validation data by the data-splitting method. Based on the model AIC, the ZINB model best fitted the fire occurrence data, followed by (in order of smaller AIC) NBH, ZIP, NB, PH, and Poisson models. The ZINB model was also best for pre- dicting either zero counts or positive counts (〉1). The two Hurdle models (PH and NBH) were better than ZIP, Poisson, and NB models for predicting positive counts, but worse than these three models for predicting zero counts. Thus, the ZINB model was the first choice for modeling the occurrence of lightning-induced forest fires in this study, which implied that the excessive zero counts of lightning- induced fires came from both structure and sampling zeros.
基金supported by the "948" Project of the State Forestry Administration of China(No.2013-4-66)
文摘The mortality of trees across diameter class model is a useful tool for predicting changes in stand structure.Mortality data commonly contain a large fraction of zeros and general discrete models thus show more errors.Based on the traditional Poisson model and the negative binomial model,different forms of zero-inflated and hurdle models were applied to spruce-fir mixed forests data to simulate the number of dead trees.By comparing the residuals and Vuong test statistics,the zero-inflated negative binomial model performed best.A random effect was added to improve the model accuracy;however,the mixed-effects zero-inflated model did not show increased advantages.According to the model principle,the zeroinflated negative binomial model was the most suitable,indicating that the"0"events in this study,mainly from the sample"0",i.e.,the zero mortality data,are largely due to the limitations of the experimental design and sample selection.These results also show that the number of dead trees in the diameter class is positively correlated with the number of trees in that class and the mean stand diameter,and inversely related to class size,and slope and aspect of the site.
文摘Many researchers have discussed zero-inflated univariate distributions. These univariate models are not suitable, for modeling events that involve different types of counts or defects. To model several types of defects, multivariate Poisson model is one of the appropriate model. This can further be modified to incorporate inflation at zero and we can have multivariate zero-inflated Poisson distribution. In the present article, we introduce a new Bivariate Zero Inflated Power Series Distribution and discuss inference related to the parameters involved in the model. We also discuss the inference related to Bivariate Zero Inflated Poisson Distribution. The model has been applied to a real life data. Extension to k-variate zero inflated power series distribution is also discussed.
基金support from the National Science,Research and Innovation Fund (NSRF)King Mongkut’s University of Technology North Bangkok (Grant No.KMUTNB-FF-65-22).
文摘A new three-parameter discrete distribution called the zero-inflated cosine geometric(ZICG)distribution is proposed for the first time herein.It can be used to analyze over-dispersed count data with excess zeros.The basic statistical properties of the new distribution,such as the moment generating function,mean,and variance are presented.Furthermore,confidence intervals are constructed by using the Wald,Bayesian,and highest posterior density(HPD)methods to estimate the true confidence intervals for the parameters of the ZICG distribution.Their efficacies were investigated by using both simulation and real-world data comprising the number of daily COVID-19 positive cases at the Olympic Games in Tokyo 2020.The results show that the HPD interval performed better than the other methods in terms of coverage probability and average length in most cases studied.
文摘In this paper,we discuss some important aspects of the bivariate alternative zero inflated log-arithmic series distribution(BAZILSD)of which the marginals are the alternative zero-inflated logarithmic series ditributions of Kumar and Riyaz(2015.An alternative version of zero-inflated logarithmic series distribution and some of its applications.Journal of Statistical Computation and Simulation,85(6),1117-1127).We study some important properties of the distribution by deriving expressions for its probability mass function,factorial moments,conditional probabil-ity generating functions,and recursion formulae for its probilities,raw moments and factorial moments.The parameters of the BAZILSD are estimated by the method of maximum likelihood and certain test procedures are also considered.Further certain real-life data applications are cited for ilustrating the usefulness of the model.A simulation study is conducted for assessing the performance of the maximum likelihood estimators of the parameters of the BAZILSD.
文摘Background: The recently emerged technology of methylated RNA immunoprecipitation sequencing (MeRIP-seq) sheds light on the study of RNA epigenetics. This new bioinformatics question calls for effective and robust peaking calling algorithms to detect mRNA methylation sites from MeRIP-seq data. Methods: We propose a Bayesian hierarchical model to detect methylation sites from MeRIP-seq data. Our modeling approach includes several important characteristics. First, it models the zero-inflated and over-dispersed counts by deploying a zero-inflated negative binomial model. Second, it incorporates a hidden Markov model (HMM) to account for the spatial dependency of neighboring read enrichment. Third, our Bayesian inference allows the proposed model to borrow strength in parameter estimation, which greatly improves the model stability when dealing with MeRIP-seq data with a small number of replicates. We use Markov chain Monte Carlo (MCMC) algorithms to simultaneously infer the model parameters in a de novo fashion. The R Shiny demo is available at https://qiwei. shinyapps.io/BaySeqPeak and the R/C ++ code is available at https://github.com/liqiwei2000/BaySeqPeak. Results: In simulation studies, the proposed method outperformed the competing methods exomePeak and MeTPeak, especially when an excess of zeros were present in the data. In real MeRIP-seq data analysis, the proposed method identified methylation sites that were more consistent with biological knowledge, and had better spatial resolution compared to the other methods. Conclusions: In this study, we develop a Bayesian hierarchical model to identify methylation peaks in MeRIP-seq data. The proposed method has a competitive edge over existing methods in terms of accuracy, robustness and spatial resolution.
基金The proposed COM-negative binomial distribution of this work was as early as conceptualized in December, 2014 when the authors saw the online version of [15]. The authors want to thank Prof. R. KShler for mailing the valuable encyclopedia of discrete univariate distributions [39] to them. This work was partly supported by the National Natural Science Foundation of China (Grant No. 11201165).
文摘We focus on the COM-type negative binomial distribution with three parameters, which belongs to COM-type (a, b, 0) class distributions and family of equilibrium distributions of arbitrary birth-death process. Besides, we show abundant distributional properties such as overdispersion and underdispersion, log-concavity, log-convexity (infinite divisibility), pseudo compound Poisson, stochastic ordering, and asymptotic approximation. Some characterizations including sum of equicorrelated geometrically distributed random variables, conditional distribution, limit distribution of COM-negative hypergeometric distribution, and Stein's identity are given for theoretical properties. COM- negative binomial distribution was applied to overdispersion and ultrahigh zeroinflated data sets. With the aid of ratio regression, we employ maximum likelihood method to estimate the parameters and the goodness-of-fit are evaluated by the discrete Kolmogorov-Smirnov test.
基金supported by the Basic Performance Key Project,the Ministry of Science and Technology of the People’s Republic of China(No.2006FY110300)
文摘Objective Sub-health status has progressively gained more attention from both medical professionals and the publics. Treating the number of sub-health symptoms as count data rather than dichotomous data helps to completely and accurately analyze findings in sub-healthy population. This study aims to compare the goodness of fit for count outcome models to identify the optimum model for sub-health study.Methods The sample of the study derived from a large-scale population survey on physiological and psychological constants from 2007 to 2011 in 4 provinces and 2 autonomous regions in China. We constructed four count outcome models using SAS: Poisson model, negative binomial (NB) model, zero-inflated Poisson (ZIP) model and zero-inflated negative binomial (ZINB) model. The number of sub-health symptoms was used as the main outcome measure. The alpha dispersion parameter and O test were used to identify over-dispersed data, and Vuong test was used to evaluate the excessive zero count. The goodness of fit of regression models were determined by predictive probability curves and statistics of likelihood ratio test.Results Of all 78 307 respondents, 38.53% reported no sub-health symptoms. The mean number of sub-health symptoms was 2.98, and the standard deviation was 3.72. The statistic O in over-dispersion test was 720.995 (P<0.001); the estimated alpha was 0.618 (95% CI: 0.600-0.636) comparing ZINB model and ZIP model; Vuong test statistic Z was 45.487. These results indicated over-dispersion of the data and excessive zero counts in this sub-health study. ZINB model had the largest log likelihood (-167 519), the smallest Akaike’s Information Criterion coefficient (335 112) and the smallest Bayesian information criterion coefficient (335455),indicating its best goodness of fit. The predictive probabilities for most counts in ZINB model fitted the observed counts best. The logit section of ZINB model analysis showed that age, sex, occupation, smoking, alcohol drinking, ethnicity and obesity were determinants for presen
文摘Objectives: This study empirically assesses the impact of the changes in women’s characteristics, empowerment, availability and quality of health services on woman’s decision to use antenatal care (ANC) and the frequency of that use during the period 2000-2008. Study Design: The study is a cross-sectional analytical study using 2000 and 2008 Egypt Demographic and Health Surveys. Methods: The assessment of the studied impact is conducted using the Zero-inflated Negative Binomial Regression. In addition, Factor Analysis technique is used to construct some of the explanatory variables such as women’s empowerment, the availability and quality of health services indicators. Results: Utilization of antenatal health care services is greatly improved from 2000 to 2008. Availability of health services is one of the main determinants that affect the number of antenatal care visits in 2008. Wealth index and quality of health services play an important role in raising the level of antenatal care utilization in 2000 and 2008. However, the impact of the terminated pregnancy on receiving ANC increased over time. Conclusions: Further research of the determinants of antenatal health care utilization is needed, using more updated measures of women’s empowerment, availability and quality of health services. In order to improve the provision of antenatal health care services, it is important to understand barriers to antenatal health care utilization. Therefore, it is advisable to collect information from women about the reasons for not receiving antenatal care.