TEXT SIZE

CrossRef (0)
A Bayesian joint model for continuous and zero-inflated count data in developmental toxicity studies

Beom Seuk Hwang1,a

aDepartment of Applied Statistics, Chung-Ang University, Korea
Correspondence to: 1 Department of Applied Statistics, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul 06974, Korea. E-mail: bshwang@cau.ac.kr
Received September 17, 2021; Revised October 28, 2021; Accepted November 7, 2021.
Abstract
In many applications, we frequently encounter correlated multiple outcomes measured on the same subject. Joint modeling of such multiple outcomes can improve efficiency of inference compared to independent modeling. For instance, in developmental toxicity studies, fetal weight and number of malformed pups are measured on the pregnant dams exposed to different levels of a toxic substance, in which the association between such outcomes should be taken into account in the model. The number of malformations may possibly have many zeros, which should be analyzed via zero-inflated count models. Motivated by applications in developmental toxicity studies, we propose a Bayesian joint modeling framework for continuous and count outcomes with excess zeros. In our model, zero-inflated Poisson (ZIP) regression model would be used to describe count data, and a subjectspecific random effects would account for the correlation across the two outcomes. We implement a Bayesian approach using MCMC procedure with data augmentation method and adaptive rejection sampling. We apply our proposed model to dose-response analysis in a developmental toxicity study to estimate the benchmark dose in a risk assessment.
Keywords : Bayesian inference, benchmark dose, data augmentation method, Markov chain Monte Carlo (MCMC), zero-inflated Poisson regression model
1. Introduction

In many bioscience studies, there is interest in simultaneously investigating two or more measurements within a single subject. In particular, it is often considered that two or more associated outcomes are affected at the same time depending on the level of certain treatments. For instance, developmental toxicity studies examine how a variety of variables are affected by pregnant mice exposed to several different levels of substances, such as rat weight, liver weight, gravid uterine weight, number of dead fetuses, presence of malformation, and fetal weight. Such variables should be jointly analyzed to take into account association between them because separate modeling can yield biased inferences (Catalano and Ryan, 1992; Regan and Catalano, 1999; Dunson, 2000; McCulloch, 2008; Hwang and Pennell, 2014; Hwang and Pennell, 2018). In the literature, several models have used latent variables to jointly analyze different types of outcomes such as continuous, binary, ordinal, unordered categorical or count variables. Catalano and Ryan (1992) proposed a joint model for binary and continuous outcomes assuming that a latent continuous variable underlying the binary outcome and continuous outcome have a bivariate normal random effects. Dunson (2000) proposed a general framework for modeling clustered mixed outcomes using a mixture of generalized linear models to describe the joint distribution of a set of underlying variables, and Dunson and Herring (2005) provided a general underlying Poisson variable model for mixed discrete outcomes. They followed a Bayesian approach to inference of the parameters and latent variables via a Markov chain Monte Carlo (MCMC) sampling algorithm. Nonparametric Bayesian methods also have been proposed for joint modeling of dicrete and continuous outcomes to describe irregular changes in distributions of the response variables (Hwang and Pennell, 2014; Fronczyk and Kottas, 2017; Hwang and Pennell, 2018).

In modeling count data in any research fields, it is common to encounter a large number of zero values. For example, developmental toxicology studies may be interested in how many birth defects the pregnant dams exposed to a toxic substance have. Then, many dams may not have any defects if the toxic level is not that high. Commonly used regression models for count data using Poisson or negative binomial distributions may not fit these count outcomes well, resulting in underestimating the zero-count probability. Such count data with excess zeros should be analyzed using discrete mixture models such as zero-inflated Poisson (ZIP) model and zero-inflated negative binomial (ZINB) model. Lambert (1992) proposed the zero-inflated Poisson (ZIP) regression model to describe discrete data with many zeros. The ZIP model then has a Poisson-Bernoulli mixture structure, which assumes two different sources for zero values: structural zeros by those subjects whose count response will always be zero, and sampling zeros that would occur by chance generated by a Poisson distribution. In addition to ZIP models, excess zeros in count data can also be accommodated by zero-inflated negative binomial models (Hall, 2000) or the hurdle models (Mullahy, 1986). Many researchers have utilized a variety of models to analyze zero-inflated count data. Hall (2000), Yau and Lee (2001), and Min and Agresti (2005) have proposed various zero-inflated models with random effects. Liu and Powers (2007) applied a growth curve model to longitudinal count data with excess zeros, and Lee et al. (2011) proposed a marginalized model approach to analyze clustered zero-inflated count data. Further, Kassahun et al. (2015) presented a joint modeling framework for hierarchical continuous and zero-inflated overdispersed count data.

There have recently been many literatures on Bayesian alternatives to fitting zero-inflated models. Rodrigues (2003) presented a Bayesian ZIP model based on the augmented data, and Ghosh et al. (2006) have proposed a Bayesian approach of zero-inflated power series (ZIPS) regression models including ZIP model for cross-sectional data using MCMC with data augmentation method. A hierarchical Bayesian model of correlated zero-inflated count data was presented by Dagne (2004), and several Bayesian models for longitudinal zero-inflated count data were also proposed by Neelon et al. (2010). Neelon et al. (2010) described Bayesian approaches to fitting repeated measures zero-inflated count data using three models: Poisson hurdle model, ZIP model, and zero-altered Poisson (ZAP) model. More recently, Neelon (2019) proposed a Bayesian ZINB regression model based on Pólya-Gamma mixtures to apply the approach to a spatiotemporal data analysis.

Motivated by developmental toxicity studies, we present a Bayesian joint modeling framework for continuous and zero-inflated count data. We utilize the zero-inflated Poisson (ZIP) regression model proposed by Lambert (1992) for count data with excess zeros. To account for dependency structure between continuous and count outcomes, the bivariate normal distribution will be assigned to subject-specific random effect shared across outcomes, which is expected to improve efficiency. We use the MCMC procedure with data augmentation and adaptive rejection sampling methods proposed by Ghosh et al. (2006) for parameter estimations. The proposed joint model will be applied to developmental toxicity data to estimate a benchmark dose (BMD) in a risk assessment.

The rest of the paper is organized as follows. In Section 2, we present a Bayesian joint model for continuous and zero-inflated count data, and describe the Bayesian inference procedures as well as model comparison criterion using deviance information criterion (DIC). In Section 3, we apply the proposed model to data from a developmental toxicity study of diethylhexyl phthalate (DEHP) in timed-pregnant CD-1 mice conducted by the National Toxicology Program. We conclude with a discussion of our results and future work in Section 4.

2. Methods

### 2.1. Zero-inflated Poisson regression model

Let Y = (Y1, …, Yn) be a vector of responses which is count data with an excess number of zeros. Lambert (1992) proposed the zero-inflated Poisson (ZIP) regression model, in which the count data was treated as a mixture of inflated zeros and Poisson counts. Let Z be an unobserved latent variable underlying the zero-inflated indicator outcome: for i = 1, 2, …, n,

$Zi={1,inflated zeros with probability pi,0,not inflated values with probability 1-pi.$

Thus, the mixture of inflated zeros and Poisson counts can be written as

$Yi∣Zi=1≡0,Yi∣Zi=0~Poisson(λi).$

We can typically incorporate covariates into the model using the log link function for the expected mean count λi and the logit link function for pi, which provide

$log(λi)=x1iTβ1,logit(pi)=x2iTβ2,$

where x1i denotes a vector of explanatory variables and β1 the vector of regression coefficients for the Poisson count stage, and x2i denotes a vector of explanatory variables and β2 the vector of regression coefficients for the Bernoulli zero-inflation stage. It follows that

$λi=ex1iTβ1, and pi=ex2iTβ21+ex2iTβ2.$

The resulting distribution gives

$P(Yi=yi∣λi,pi)={pi+(1-pi) e-λi,if yi=0,(1-pi)e-λiλiyiyi!if yi>0.$

We note that zero counts arise from both the Poisson count stage with parameter λ and the zero-inflation stage with parameter p. It is easy to show that

$E(Yi) =(1-pi) λi,Var(Yi) =λi(1-pi) (1+piλi),$

which implies that the ZIP regression model accommodates for overdispersion.

### 2.2. Joint model for continuous and zero-inflated count data

Let xi be a predictor variable and Y = (Yi1, Yi2)T be a 2 × 1 response vector measured on subject i (i = 1, …, n), where Yi1 denotes the continuous response and Yi2 denotes the count data with an excess number of zeros. For instance, in a developmental toxicity study, xi may be the dose level of a toxic substance assigned to ith litter, Yi1 may be the average fetal body weight of ith litter, and Yi2 may be the number of malformed pups that ith litter has. We know that the body weight and the number of malformations of the litters may be correlated and thus joint modeling can help to achieve better analysis results.

We jointly model the continuous response and the count response as follows, for i = 1, 2, …, n,

$Yi1 =β01+β11xi+Ui1+ɛi,P(Yi2=yi2∣λi,pi) =[pi+(1-pi)e-λi ]zi [(1-pi)e-λiλiyi2yi2!]1-zi,$

where

$λi=exp (β02+β12xi+Ui2),pi=exp (β03+β13xi)1+exp (β03+β13xi).$

Note that λi and pi are the parameters for Yi2, and β0 j and β1 j are the fixed intercept and slope of xi for j = 1, 2, 3. We accounted for excess zeros in Yi2 using the logit model with the zero-inflation probability pi. Further, we assume that $ɛi~N(0,σɛ2)$, and Ui1 and Ui2 represent a subject-specific random effect assuming bivariate normal distribution with mean and variance-covariance matrix, independent of εi, Ui = (Ui1, Ui2)T ~ N(0, ∑), where $Σ=[σ12ρσ1σ2ρσ1σ2σ22]$. The subject-specific random effect (Ui1, Ui2)T models the correlation across the two outcomes, resulting in more flexible modeling and more efficient estimates (McCulloch, 2008). The complete data likelihood function of this model can be written as

$L (β,σɛ2,Σ∣Y,X,Z,U) =∏i=1n12πσɛexp [-12σɛ2{yi1-(β01+β11xi+ui1)}2] × [pi+(1-pi)e-λi ]zi[(1-pi)e-λiλiyi2yi2!]1-zi × 12π∣Σ∣exp [-12uiTΣ-1ui]$

where λi and pi were defined in (2.1).

### 2.3. Bayesian inference for joint model

2.3.1. Prior specification

We specify the conjugate priors for the regression coefficients β’s and the variances, $σɛ2$ and ∑ as follows,

$β1=(β01,β11)T ~N(β1m,τ12I2),β2=(β02,β12)T ~N(β1m,τ12I2),β3=(β03,β13)T ~N(β3m,τ32I2),σɛ2 ~IG(a,b),Σ ~Inverse-Wishart(Ψ,ν),$

where $N(βjm,τj2I2)$ is the 2-dimensional multivariate normal distribution with mean vector $βjm$ and variance matrix $τj2I2$, for j = 1, 2, 3, IG(a, b) is the inverse gamma distribution with shape parameter a and scale parameter b, and Inverse-Wishart(Ψ, ν) is the inverse-Wishart distribution with scale matrix Ψ and degrees of freedom ν.

2.3.2. Posterior computation

We use the Markov chain Monte Carlo (MCMC) procedure with data augmentation method (Ghosh et al., 2006) to generate samples from the posterior distribution of parameters of interest. We incorporate the latent random variables V and B, with the representation Y = V(1 − B), where B is a Bernoulli(p) random variable and V independently to B is a Poisson(λ) random variable. To overcome the overdispersion issue in dataset and utilize more efficient sampling method, Gibbs sampler, we obtain samples from the posterior of (λ, p, V, B) given the data Y. The data augmentation step contains the conditional distribution of (V, B) given (Y, λ, p) as follows,

$P(V=v,B=0∣Y=y)={(1-p)P(V=0)p+(1-p)P(V=0)if v=y=0;1,if v=y>0;0,otherwise,P(V=v,B=1∣Y=y)={pP(V=v)p+(1-p)P(V=0),if y=0;1,otherwise.$

We obtain the full posterior distribution of (β, $σɛ2$, ∑) by multiplying the complete data likelihood function in (2.3) and the priors in (2.4) based on data augmentation technique as follows,

$p (β,σɛ2,Σ∣Y,X,U,V,B) ∝∏i=1n{12πσɛexp [-12σɛ2{yi1-(β01+β11xi+ui1)}2] × [exp (β03+β13xi)1+exp (β03+β13xi)]Bi[11+exp (β03+β13xi)]1-Bi × exp (-exp(β02+β12xi+ui2)) (exp(β02+β12xi+ui2)Vi × 12π∣Σ∣exp [-12uiTΣ-1ui]} × N (β1;β1m,τ12I2)N (β2;β2m,τ22I2)N (β3;β3m,τ32I2) × IG (σɛ2;a,b)IW (Σ;Ψ,ν).$

The full conditional posterior densities of β, $σɛ2$, ∑ can be computed from (2.6) as follows,

$p(β∣σɛ2,Σ,Y,X,U,V,B) ∝∏i=1n{12πσɛexp [-12σɛ2{yi1-(β01+β11xi+ui1)}2] × [exp (β03+β13xi)1+exp (β03+β13xi)]Bi[11+exp (β03+β13xi)]1-Bi × exp (-exp(β02+β12xi+ui2)) (exp(β02+β12xi+ui2)Vi × N (β1;β1m,τ12I2)N (β2;β2m,τ22I2)N (β3;β3m,τ32I2),$$p(σɛ2∣β,Σ,Y,X,U,V,B)~IG (a+n2,b+12∑i=1n{yi1-(β01+β11xi+ui1)}2),$$p(Σ∣β,σɛ2,Y,X,U,V,B)~IW (n+ν,Ψ+∑i=1nuiuiT).$

We note that the full conditional posterior densities for β are not standard distributions, so the adaptive rejection sampling (ARS) can be used to sample β from their full conditional distributions, which are algebraically very complicated yet log-concave (Gilks and Wild, 1992). The following MCMC algorithm can be implemented to generate samples from the conditional distributions.

• Step 1: Set the dispersed initial values of β, $σɛ2$ and ∑.

• Step 2: Data augmentation step: Sample (Vi, Bi) based on the sampled value of (λi, pi) and the data Yi2 for i = 1, 2, …, n.

• Step 3: Sample β at time t + 1, given $(Vi(t),Bi(t)),σɛ2(t)$ and ∑(t) at time t from (2.7) using ARS.

• Step 4: Sample $σɛ2$ at time t + 1, given ($Vi(t),Bi(t)$), ∑(t) and β(t+1) at time t from (2.8) using Gibbs sampler.

• Step 5: Sample ∑ at time t + 1, given ($Vi(t),Bi(t)$), β(t+1) and $σɛ2(t+1)$ at time t from (2.9) using Gibbs sampler.

• Step 6: Return to the Step 2 and repeat until convergence.

We implement MCMC algorithm using several dispersed initial values and then use the Bayesian diagnostics such as trace plots and Gelman and Rubin’s potential scale reduction factor (Gelman et al., 2014) to determine whether the algorithm was convergent or not.

### 2.4. Bench mark dose lower limit (BMD/BMDL) estimation

We apply the proposed model into dose-response analysis in toxicology studies to estimate the benchmark dose (BMD), defined as the dose associated with a predefined small change in the response, called the benchmark response (BMR), from the background response level (Crump, 1995). Different definitions of the BMD exist depending on different types of response data, binary, continuous and count response data, and different types of changes in the response (Jensen et al., 2019; Jensen et al., 2020). Both the US Environmental Protection Agency (EPA) and the European Food Safety Authority (EFSA) recommend the definition of the BMD using relative response for count or continuous response data as follows,

$BMR=f(BMD,β)-f(0,β)f(0,β),$

where f (.) denotes the dose response model, and β is the vector of parameters in the model (US Environmental Protection Agency, 2012; Hardy et al., 2017). The US EPA and the EFSA also recommend choosing the BMR value of 5%. In practice, the BMD lower limit (BMDL), defined by the lower limit of a one-sided confidence interval of the BMD estimate, is used as reference values or limit values in regulatory applications.

### 2.5. Model comparison

We also fit two other models to compare the performance of our proposed model. An independent ZIP model accounts for continuous outcome and zero-inflated count outcome independently assuming ρ = 0 in the joint model in 2.1. A joint Poisson model jointly analyzes two outcomes, but does not account for excess zeros assuming β03 = β13 = 0 in the joint model in 2.1. For model comparison, the deviance information criterion (DIC) would be used (Spiegelhalter et al., 2002). The DIC is defined by $DIC=D(θ)¯+pD$, where D(θ) = −2 log f (y|θ) is the deviance, $D(θ)¯$ is the posterior mean of deviance, and $pD=D(θ)¯-D(θ^)$ is the effective number of parameters in a model. Like the Akaike information criterion (AIC), the DIC accounts for both a measure of fit (deviance) and a measure of complexity (pD) from a Bayesian perspective, and a model with a smaller DIC fits the data better. Celeux et al. (2006) proposed a modified DIC for models with mixtures of distributions and random effects as follows,

$DIC4=-4Eθ,Z[log f(y,Z∣θ)∣y]+2EZ[log f(y,Z∣Eθ[θ∣y,Z])∣y],$

where y is the observed data, Z are the random effects and latent variables, and f (y, Z|θ) is the complete data likelihood function. We use the modified DIC for model comparison in our application.

3. Developmental toxicity data example

### 3.1. Data description

We describe the proposed approach using data from a developmental toxicity study of diethylhexyl phthalate (DEHP) in timed-pregnant CD-1 mice conducted by the National Toxicology Program (NTP study: TER84064). DEHP is a substance mainly used as an additive in plastics to make them more flexible, and its widespread use in everyday and medical products has raised some concerns about its safety. DEHP breaks down slowly in water, but it can remain for a long time in soil and sediment, so it accumulates in organisms living in soil and sediment. In this study, pregnant CD-1 mice were exposed to DEHP in their feed at five different dose levels(0%, 0.025%, 0.05%, 0.10%, and 0.15% in feed). All live fetuses for each litter were examined for body weights and the occurrence of any types of malformations. Table 1 summarizes the average fetal body weight and the number of litters with no malformed pups per dose group. In the table, the average fetal body weight shows a trend of decreasing, and the number of litters with malformation increases (the count of litters with no malformed fetuses decreases) as the dose level increases. It is obvious that two litter-level outcomes are related each other so that they should be jointly modeled to consider the correlation between two outcomes within each litter. Moreover, the malformed pups shows a large incidence of zero counts, which requires zero-inflated count model, zero-inflated Poisson model.

### 3.2. Bayesian joint model

We use the proposed joint model to analyze the average fetal weight and the number of malformation within a litter. Let xi be the assigned dose level of ith litter, which has five different dose values(0%, 0.025%, 0.05%, 0.10%, and 0.15% in feed), Yi1 be the average fetal weight (g) of ith litter, and Yi2 be the number of malformed fetuses within ith litter. To implement Bayesian approach, we set the noninformative priors for β, $σɛ2$ and ∑: βj ~ N(0, I2), j = 1, 2, 3, $σɛ2~IG(1,1)$, and ∑ ~ Inverse-Wishart(Ψ = I2, ν = 4). Results were obtained by Markov chain Monte Carlo algorithm with data augmentation (Ghosh et al., 2006) described in Section 2.3.2. A total of 10,000 MCMC samples were generated from the conditional posterior distributions of parameters, and the posterior summary was based on 5,000 samples after a burn-in of 5,000 iterations. Convergence of MCMC algorithm was confirmed via traceplots and Gelman and Rubin’s potential scale reduction factor (Gelman et al., 2014). The MCMC algorithm was implemented using R 3.6.3 (R Core Team, 2020), and particularly the adaptive rejection sampling (ARS) method for sampling of β’s was conducted via arms function of armspp package in R.

### 3.3. Results

Posterior summaries for parameters in our proposed model (joint ZIP model) as well as the independent ZIP model and the joint Poisson model are provided in Table 2. According to the model comparison criteria DICs, we confirmed that the joint ZIP model had the smallest DIC (−1621.4) and thus fit the DEHP data comparatively better than the other two models. Table 2 also provides parameter estimates of the three models (posterior means and 95% credible intervals). The joint ZIP model shows significant effects of dose level on fetal weight and the number of malformations, respectively: β̂11 = −0.941 indicates that fetal weight decreases as the DEHP level increases, and β̂12 = 4.872 indicates that the number of malformed pups increases as the DEHP level increases.

We can examine zero-inflation from the Bernoulli model component in the joint model in 2.1. The significant positive coefficient of the intercept (β̂03 = 1.415) indicated strong evidence of zero-inflation although the coefficient of the slope was not significant (β̂13 = −0.244, CI=(−2.203, 1.718)). Considering zero-inflation aspect in the model greatly reduced DIC value from the joint Poisson model, meaning that the model fit was greatly improved. Parameter estimates for the random effects can explain the performance of jointly modeling the data. Obviously, fetal weight and number of malformed pups showed a negative association as evidenced by significant negative correlation of the random effects ( ρ̂ = −0.127). The joint ZIP model had better model fit (smaller DIC value) than the independent ZIP model separately modeling two outcomes, although they have very similar parameter estimates as a whole.

Figure 1 represents the estimated relative response curves along with 95% credible bands for fetal weight and number of malformed pups, respectlvely for the joint ZIP model as defined in Section 2.4. As dose level increases from 0% to 0.15%, the relative response curves linearly decreases for fetal weight and increases for number of malformations in a quadratic way. When we used the BMR value of 5% according to the recommendations of the US EPA and the EFSA, we obtained the estimated BMD and BMDL, 0.051% and 0.033%, respectively for fetal weight, and 0.011% and 0.008%, respectively for the number of malformations in the joint ZIP model. Table 3 gives the estimated BMDs and BMDLs for each model. The estimated BMD and BMDL of the joint ZIP model were comparable to or slightly smaller than the other two models. We know that the joint ZIP model can be used to determine the BMD for regulatory purposes, and suggests regulatory limits which are more protective (smaller BMDL) in a risk assessment.

4. Concluding remarks

We have proposed a Bayesian joint zero-inflated Poisson model for continuous and count data with excess zeros, in which the association between the two outcomes was taken into account via the subject-specific random effect. When we applied the proposed model to DEHP dataset from developmental toxicity study, the model showed better model fit than independent zero-inflated Poisson model and joint Poisson model based on DIC measures. We were able to confirm the negative effect of DEHP level on fetal weight and the positive effect of DEHP level on the number of malformations, respectively. It also turned out that the DEHP data should be analyzed using models dealing with excess zeros and jointly considering two outcomes. Further, our proposed model can be applied to a developmental toxicity study to estimate BMD and BMDL in dose-response analysis. We have confirmed the joint ZIP model suggests more protective regulatory limits in a risk assessment.

In terms of estimation, we presented a Bayesian approach to fitting the joint zero-inflated Poisson model based on the MCMC procedure with data augmentation method. The Bayesian approach has several advantages including use of prior information, easy interpretation, accounting for uncertainty in the parameter estimates and suitability for small sample data compared to maximum likelihood procedures. The data augmentation method used for count data with excess zeros greatly helped to improve efficiency of estimation compared to commonly used Metropolis-Hastings algorithm (Chip and Greenberg, 1995).

In the proposed model, ZIP model was used to deal with excess zeros in count outcome. However, the ZIP model can often be unstable if there is zero-deflated counts at a particular level of a categorical predictor, even if there are many zeros overall (Min and Agresti, 2005). The zero-inflated negative binomial (ZINB) model or the hurdle model may be considered to be used to describe zero-inflated count data instead of ZIP model to address unstableness of the model. In the proposed model (equation (2.1)), it was assumed that the fixed effects had a linear trend with dose level. The fixed effects could be generalized using a nonparametric method, e.g., regression splines. The random effects could also be extended in several ways. Nonparametric Bayesian methods such as Dirichlet process priors can be used for the random effect distribution instead of bivariate normal distribution, which will result in more flexible modeling (Hwang and Pennell, 2014). Another extension may include allowing random slopes in addition to random intercepts. As seen in Table 1, litter size is noticeably small at a high level of DEHP(0.1% and 0.15%) in Section 3. To see how litter size affects modeling results, we can take into account joint modeling framework of informative litter size and outcomes (Dunson et al., 2003; Hwang and Pennell, 2018).

Figures
Fig. 1. Estimated relative response curves with 95% credible bands for fetal weight (left) and number of malformations (right) in the DEHP dataset. For the BMR value of 5%, BMD and BMDL were displayed for fetal weight (and not for number of malformations).
TABLES

### Table 1

Summary of the DEHP study data

Dose (%)LittersPupsLitters with zero malformation(%)Fetal weight(g) Mean(SD)
02830122(78%)0.94(0.11)
0.0252629121(81%)0.97(0.11)
0.0502628110(38%)0.92(0.12)
0.100181473(17%)0.90(0.15)
0.15010531(10%)0.79(0.15)

### Table 2

Posterior means and 95% credible intervals for parameters in the DEHP data analysis

Independent ZIP ModelJoint Poisson ModelJoint ZIP Model

EffectParameterEstimate95% C.I.Estimate95% C.I.Estimate95% C.I.
Continuous outcomeβ010.962(0.919, 1.008)0.961(0.918, 1.004)0.960(0.916, 1.005)
β11−0.947(−1.592, −0.295)−0.951(−1.587, −0.315)−0.941(−1.582, −0.300)
$σɛ2$0.031(0.024, 0.040)0.030(0.024, 0.038)0.027(0.021, 0.036)

Count outcomeβ020.081(−0.122, 0.269)0.373(0.162, 0.574)0.080(−0.117, 0.271)
β124.873(3.190, 6.562)4.015(2.324, 5.765)4.872(3.211, 6.573)
β030.897(−0.246, 1.990)1.415(0.437, 2.512)
β13−0.211(−2.106, 1.739)−0.244(−2.203, 1.718)

Random effect$σ12$0.103(0.079, 0.125)0.092(0.070, 0.115)0.090(0.071, 0.113)
$σ22$0.094(0.072, 0.118)0.090(0.071, 0.112)0.091(0.073, 0.114)
ρ−0.051(−0.192, 0.089)−0.127(−0.184, −0.072)

DIC−1608.9−1526.2−1621.4

### Table 3

Estimated BMD and BMDL in the DEHP data analysis

Fetal weightNumber of malformations

BMD (%)BMDL (%)BMD (%)BMDL (%)
Independent ZIP Model0.0510.0340.0120.009
Joint Poisson Model0.0520.0340.0140.011
Joint ZIP Model0.0510.0330.0110.008

References
1. Catalano PJ and Ryan LM (1992). Bivariate latent variable models for clustered discrete and continuous outcomes. Journal of the American Statistical Association, 87, 651-658.
2. Celeux G, Forbes F, Robert CP, and Titterington DM (2006). Deviance information criterion for missing data models. Bayesian Analysis, 1, 651-674.
3. Chip S and Greenberg E (1995). Understanding the Metropolis-Hastings algorithm. The American Statistician, 49, 327-335.
4. Crump KS (1995). Calculation of benchmark doses from continuous data. Risk Analysis, 15, 79-89.
5. Dagne GA (2004). Hierarchical Bayesian analysis of correlated zero-inflated count data. Biometrical Journal, 46, 653-663.
6. Dunson DB (2000). Bayesian latent variable models for clustered mixed outcomes. Journal of the Royal Statistical Society: Series B, 62, 355-366.
7. Dunson DB, Chen Z, and Harry JA (2003). A Bayesian approach for joint modeling of cluster size and subunit-specific outcomes. Biometrics, 59, 521-530.
8. Dunson DB and Herring AH (2005). Bayesian latent variable models for mixed discrete outcomes. Biostatistics, 6, 11-25.
9. Fronczyk K and Kottas A (2017). Risk assessment for toxicity experiments with discrete and continuous outcomes: A Bayesian nonparametric approach. Journal of Agricultural, Biological, and Environmental Statistics, 22, 585-601.
10. Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, and Rubin DB (2014). Bayesian Data Analysis, New York, CRC Press.
11. Ghosh SK, Mukhopadhyay P, and Lu J-C (2006). Bayesian analysis of zero-inflated regression models. Journal of Statistical Planning and Inference, 136, 1360-1375.
12. Gilks WR and Wild P (1992). Adaptive rejection sampling for Gibbs sampling. Journal of the Royal Statistical Society. Series C (Applied Statistics), 41, 337-348.
13. Hall DB (2000). Zero-inflated Poisson and binomial regression with random effects: A case study. Biometrics, 56, 1030-1039.
14. Hardy A, Benford D, and Halldorsson T, et al. (2017). Update: use of the benchmark dose approach in risk assessment. European Food Safety Authority Journal, 15, 1-41.
15. Hwang BS and Pennell ML (2014). Semiparametric Bayesian joint modeling of a binary and continuous outcome with applications in toxicological risk assessment. Statistics in Medicine, 33, 1162-1175.
16. Hwang BS and Pennell ML (2018). Semiparametric Bayesian joint modeling of clustered binary and continuous outcomes with informative cluster size in developmental toxicity assessment. Environmetrics, 29, 1-15.
17. Jensen SM, Kluxen FM, and Ritz C (2019). A review of recent advances in benchmark dose methodology. Risk Analysis, 39, 2295-2315.
18. Jensen SM, Kluxen FM, Streibig JC, Cedergreen N, and Ritz C (2020). bmd: an R package for benchmark dose estimation. Peer J, 8, 1-25.
19. Kassahun W, Neyens T, Molenberghs G, Faes C, and Verbeke G (2015). A joint model for hierarchical continuous and zero-inflated overdispersed count data. Journal of Statistical Computation and Simulation, 85, 552-571.
20. Lambert D (1992). Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics, 34, 1-14.
21. Lee K, Joo Y, Song JJ, and Harper DW (2011). Analysis of zero-inflated clustered count data: A marginalized model approach. Computational Statistics and Data Analysis, 55, 824-837.
22. Liu H and Powers DA (2007). Growth curve models for zero-inflated count data: an application to smoking behavior. Structural Equation Modeling, 14, 247-279.
23. McCulloch C (2008). Joint modelling of mixed outcome types using latent variables. Statistical Methods in Medical Research, 17, 53-73.
24. Min Y and Agresti A (2005). Random effect models for repeated measures of zero-inflated count data. Statistical Modeling, 5, 1-19.
25. Mullahy J (1986). Specification and testing of some modified count data models. Journal of Econometrics, 33, 341-365.
26. Neelon BH (2019). Bayesian zero-inflated negative binomial regression based on P?lya-Gamma mixtures. Bayesian Analysis, 14, 829-855.
27. Neelon BH, O’Malley AJ, and Normand SLT (2010). A Bayesian model for repeated measures zero-inflated count data with application to outpatient psychiatric service use. Statistical Modelling, 10, 421-439.
28. R Core Team (2020) R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing . https://www.R-project.org
29. Regan MM and Catalano PJ (1999). Likelihood models for clustered binary and continuous outcomes: application to developmental toxicology. Biometrics, 55, 760-768.
30. Rodrigues J (2003). Bayesian analysis of zero-inflated distributions. Communications in Statistics-Theory and Methods, 32, 281-289.
31. Spiegelhalter DJ, Best NG, Carlin BP, and Van Der Linde A (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B, 64, 583-639.
32. US Environmental Protection Agency (2012). Benchmark Dose Technical Guidance, Washington, D.C, OECD.
33. Yau KK and Lee AH (2001). Zero-inflated Poisson regression with random effects to evaluate an occupational injury prevention programme. Statistics in Medicine, 20, 2907-2920.