
In many bioscience studies, there is interest in simultaneously investigating two or more measurements within a single subject. In particular, it is often considered that two or more associated outcomes are affected at the same time depending on the level of certain treatments. For instance, developmental toxicity studies examine how a variety of variables are affected by pregnant mice exposed to several different levels of substances, such as rat weight, liver weight, gravid uterine weight, number of dead fetuses, presence of malformation, and fetal weight. Such variables should be jointly analyzed to take into account association between them because separate modeling can yield biased inferences (Catalano and Ryan, 1992; Regan and Catalano, 1999; Dunson, 2000; McCulloch, 2008; Hwang and Pennell, 2014; Hwang and Pennell, 2018). In the literature, several models have used latent variables to jointly analyze different types of outcomes such as continuous, binary, ordinal, unordered categorical or count variables. Catalano and Ryan (1992) proposed a joint model for binary and continuous outcomes assuming that a latent continuous variable underlying the binary outcome and continuous outcome have a bivariate normal random effects. Dunson (2000) proposed a general framework for modeling clustered mixed outcomes using a mixture of generalized linear models to describe the joint distribution of a set of underlying variables, and Dunson and Herring (2005) provided a general underlying Poisson variable model for mixed discrete outcomes. They followed a Bayesian approach to inference of the parameters and latent variables via a Markov chain Monte Carlo (MCMC) sampling algorithm. Nonparametric Bayesian methods also have been proposed for joint modeling of dicrete and continuous outcomes to describe irregular changes in distributions of the response variables (Hwang and Pennell, 2014; Fronczyk and Kottas, 2017; Hwang and Pennell, 2018).
In modeling count data in any research fields, it is common to encounter a large number of zero values. For example, developmental toxicology studies may be interested in how many birth defects the pregnant dams exposed to a toxic substance have. Then, many dams may not have any defects if the toxic level is not that high. Commonly used regression models for count data using Poisson or negative binomial distributions may not fit these count outcomes well, resulting in underestimating the zero-count probability. Such count data with excess zeros should be analyzed using discrete mixture models such as zero-inflated Poisson (ZIP) model and zero-inflated negative binomial (ZINB) model. Lambert (1992) proposed the zero-inflated Poisson (ZIP) regression model to describe discrete data with many zeros. The ZIP model then has a Poisson-Bernoulli mixture structure, which assumes two different sources for zero values: structural zeros by those subjects whose count response will always be zero, and sampling zeros that would occur by chance generated by a Poisson distribution. In addition to ZIP models, excess zeros in count data can also be accommodated by zero-inflated negative binomial models (Hall, 2000) or the hurdle models (Mullahy, 1986). Many researchers have utilized a variety of models to analyze zero-inflated count data. Hall (2000), Yau and Lee (2001), and Min and Agresti (2005) have proposed various zero-inflated models with random effects. Liu and Powers (2007) applied a growth curve model to longitudinal count data with excess zeros, and Lee
There have recently been many literatures on Bayesian alternatives to fitting zero-inflated models. Rodrigues (2003) presented a Bayesian ZIP model based on the augmented data, and Ghosh
Motivated by developmental toxicity studies, we present a Bayesian joint modeling framework for continuous and zero-inflated count data. We utilize the zero-inflated Poisson (ZIP) regression model proposed by Lambert (1992) for count data with excess zeros. To account for dependency structure between continuous and count outcomes, the bivariate normal distribution will be assigned to subject-specific random effect shared across outcomes, which is expected to improve efficiency. We use the MCMC procedure with data augmentation and adaptive rejection sampling methods proposed by Ghosh
The rest of the paper is organized as follows. In Section 2, we present a Bayesian joint model for continuous and zero-inflated count data, and describe the Bayesian inference procedures as well as model comparison criterion using deviance information criterion (DIC). In Section 3, we apply the proposed model to data from a developmental toxicity study of diethylhexyl phthalate (DEHP) in timed-pregnant CD-1 mice conducted by the National Toxicology Program. We conclude with a discussion of our results and future work in Section 4.
Let
Thus, the mixture of inflated zeros and Poisson counts can be written as
We can typically incorporate covariates into the model using the log link function for the expected mean count
where
The resulting distribution gives
We note that zero counts arise from both the Poisson count stage with parameter
which implies that the ZIP regression model accommodates for overdispersion.
Let
We jointly model the continuous response and the count response as follows, for
where
Note that
where
We specify the conjugate priors for the regression coefficients
where
We use the Markov chain Monte Carlo (MCMC) procedure with data augmentation method (Ghosh
We obtain the full posterior distribution of (
The full conditional posterior densities of
We note that the full conditional posterior densities for
Step 1: Set the dispersed initial values of
Step 2: Data augmentation step: Sample (
Step 3: Sample
Step 4: Sample
Step 5: Sample ∑ at time
Step 6: Return to the Step 2 and repeat until convergence.
We implement MCMC algorithm using several dispersed initial values and then use the Bayesian diagnostics such as trace plots and Gelman and Rubin’s potential scale reduction factor (Gelman
We apply the proposed model into dose-response analysis in toxicology studies to estimate the benchmark dose (BMD), defined as the dose associated with a predefined small change in the response, called the benchmark response (BMR), from the background response level (Crump, 1995). Different definitions of the BMD exist depending on different types of response data, binary, continuous and count response data, and different types of changes in the response (Jensen
where
We also fit two other models to compare the performance of our proposed model. An independent ZIP model accounts for continuous outcome and zero-inflated count outcome independently assuming
where
We describe the proposed approach using data from a developmental toxicity study of diethylhexyl phthalate (DEHP) in timed-pregnant CD-1 mice conducted by the National Toxicology Program (NTP study: TER84064). DEHP is a substance mainly used as an additive in plastics to make them more flexible, and its widespread use in everyday and medical products has raised some concerns about its safety. DEHP breaks down slowly in water, but it can remain for a long time in soil and sediment, so it accumulates in organisms living in soil and sediment. In this study, pregnant CD-1 mice were exposed to DEHP in their feed at five different dose levels(0%, 0.025%, 0.05%, 0.10%, and 0.15% in feed). All live fetuses for each litter were examined for body weights and the occurrence of any types of malformations. Table 1 summarizes the average fetal body weight and the number of litters with no malformed pups per dose group. In the table, the average fetal body weight shows a trend of decreasing, and the number of litters with malformation increases (the count of litters with no malformed fetuses decreases) as the dose level increases. It is obvious that two litter-level outcomes are related each other so that they should be jointly modeled to consider the correlation between two outcomes within each litter. Moreover, the malformed pups shows a large incidence of zero counts, which requires zero-inflated count model, zero-inflated Poisson model.
We use the proposed joint model to analyze the average fetal weight and the number of malformation within a litter. Let
Posterior summaries for parameters in our proposed model (joint ZIP model) as well as the independent ZIP model and the joint Poisson model are provided in Table 2. According to the model comparison criteria DICs, we confirmed that the joint ZIP model had the smallest DIC (−1621.4) and thus fit the DEHP data comparatively better than the other two models. Table 2 also provides parameter estimates of the three models (posterior means and 95% credible intervals). The joint ZIP model shows significant effects of dose level on fetal weight and the number of malformations, respectively:
We can examine zero-inflation from the Bernoulli model component in the joint model in 2.1. The significant positive coefficient of the intercept (
Figure 1 represents the estimated relative response curves along with 95% credible bands for fetal weight and number of malformed pups, respectlvely for the joint ZIP model as defined in Section 2.4. As dose level increases from 0% to 0.15%, the relative response curves linearly decreases for fetal weight and increases for number of malformations in a quadratic way. When we used the BMR value of 5% according to the recommendations of the US EPA and the EFSA, we obtained the estimated BMD and BMDL, 0.051% and 0.033%, respectively for fetal weight, and 0.011% and 0.008%, respectively for the number of malformations in the joint ZIP model. Table 3 gives the estimated BMDs and BMDLs for each model. The estimated BMD and BMDL of the joint ZIP model were comparable to or slightly smaller than the other two models. We know that the joint ZIP model can be used to determine the BMD for regulatory purposes, and suggests regulatory limits which are more protective (smaller BMDL) in a risk assessment.
We have proposed a Bayesian joint zero-inflated Poisson model for continuous and count data with excess zeros, in which the association between the two outcomes was taken into account via the subject-specific random effect. When we applied the proposed model to DEHP dataset from developmental toxicity study, the model showed better model fit than independent zero-inflated Poisson model and joint Poisson model based on DIC measures. We were able to confirm the negative effect of DEHP level on fetal weight and the positive effect of DEHP level on the number of malformations, respectively. It also turned out that the DEHP data should be analyzed using models dealing with excess zeros and jointly considering two outcomes. Further, our proposed model can be applied to a developmental toxicity study to estimate BMD and BMDL in dose-response analysis. We have confirmed the joint ZIP model suggests more protective regulatory limits in a risk assessment.
In terms of estimation, we presented a Bayesian approach to fitting the joint zero-inflated Poisson model based on the MCMC procedure with data augmentation method. The Bayesian approach has several advantages including use of prior information, easy interpretation, accounting for uncertainty in the parameter estimates and suitability for small sample data compared to maximum likelihood procedures. The data augmentation method used for count data with excess zeros greatly helped to improve efficiency of estimation compared to commonly used Metropolis-Hastings algorithm (Chip and Greenberg, 1995).
In the proposed model, ZIP model was used to deal with excess zeros in count outcome. However, the ZIP model can often be unstable if there is zero-deflated counts at a particular level of a categorical predictor, even if there are many zeros overall (Min and Agresti, 2005). The zero-inflated negative binomial (ZINB) model or the hurdle model may be considered to be used to describe zero-inflated count data instead of ZIP model to address unstableness of the model. In the proposed model (
Summary of the DEHP study data
Dose (%) | Litters | Pups | Litters with zero malformation(%) | Fetal weight(g) Mean(SD) |
---|---|---|---|---|
0 | 28 | 301 | 22(78%) | 0.94(0.11) |
0.025 | 26 | 291 | 21(81%) | 0.97(0.11) |
0.050 | 26 | 281 | 10(38%) | 0.92(0.12) |
0.100 | 18 | 147 | 3(17%) | 0.90(0.15) |
0.150 | 10 | 53 | 1(10%) | 0.79(0.15) |
Posterior means and 95% credible intervals for parameters in the DEHP data analysis
Independent ZIP Model | Joint Poisson Model | Joint ZIP Model | |||||
---|---|---|---|---|---|---|---|
Effect | Parameter | Estimate | 95% C.I. | Estimate | 95% C.I. | Estimate | 95% C.I. |
Continuous outcome | 0.962 | (0.919, 1.008) | 0.961 | (0.918, 1.004) | 0.960 | (0.916, 1.005) | |
−0.947 | (−1.592, −0.295) | −0.951 | (−1.587, −0.315) | −0.941 | (−1.582, −0.300) | ||
0.031 | (0.024, 0.040) | 0.030 | (0.024, 0.038) | 0.027 | (0.021, 0.036) | ||
Count outcome | 0.081 | (−0.122, 0.269) | 0.373 | (0.162, 0.574) | 0.080 | (−0.117, 0.271) | |
4.873 | (3.190, 6.562) | 4.015 | (2.324, 5.765) | 4.872 | (3.211, 6.573) | ||
0.897 | (−0.246, 1.990) | – | – | 1.415 | (0.437, 2.512) | ||
−0.211 | (−2.106, 1.739) | – | – | −0.244 | (−2.203, 1.718) | ||
Random effect | 0.103 | (0.079, 0.125) | 0.092 | (0.070, 0.115) | 0.090 | (0.071, 0.113) | |
0.094 | (0.072, 0.118) | 0.090 | (0.071, 0.112) | 0.091 | (0.073, 0.114) | ||
– | – | −0.051 | (−0.192, 0.089) | −0.127 | (−0.184, −0.072) | ||
DIC | −1608.9 | −1526.2 | −1621.4 |
Estimated BMD and BMDL in the DEHP data analysis
Fetal weight | Number of malformations | |||
---|---|---|---|---|
BMD (%) | BMDL (%) | BMD (%) | BMDL (%) | |
Independent ZIP Model | 0.051 | 0.034 | 0.012 | 0.009 |
Joint Poisson Model | 0.052 | 0.034 | 0.014 | 0.011 |
Joint ZIP Model | 0.051 | 0.033 | 0.011 | 0.008 |