In this paper, we propose extending proportional hazards frailty models to allow a discrete distribution for the frailty variable. Having zero frailty can be interpreted as being immune or cured. Thus, we develop a new survival model induced by discrete frailty with zero-inflated power series distribution, which can account for overdispersion. This proposal also allows for a realistic description of non-risk individuals, since individuals cured due to intrinsic factors (immunes) are modeled by a deterministic fraction of zero-risk while those cured due to an intervention are modeled by a random fraction. We put the proposed model in a Bayesian framework and use a Markov chain Monte Carlo algorithm for the computation of posterior distribution. A simulation study is conducted to assess the proposed model and the computation algorithm. We also discuss model selection based on pseudo-Bayes factors as well as developing case influence diagnostics for the joint posterior distribution through
A heterogeneity between individuals should be considered when survival data comes from different groups or if individuals have repeated measurements. If heterogeneity is omitted several problems may occur such as: overestimation of relative hazard rate, biased estimates of regression coefficients, and making the regression parameters estimate tend to zero (Ata and Özel, 2013).
Frailty models are the most common models to account for unobserved heterogeneity in survival analysis (Vaupel
Some authors have recently developed discrete frailty models. Among these authors, Wienke (2010) considered a discrete compound Poisson process (DCPP) for frailty models, Caroni
Long-term survival models (cure rate models) play an important role in survival analysis where sampling units are insusceptible to the occurrence of the event of interest. The proportion of such units is usually called a cured fraction. These models were first addressed by Boag (1949) and Berkson and Gage (1952), which were known in the literature as the standard mixture model. Later, an alternative model called a promotion time cure model was proposed and investigated by Yakovlev and Tsodikov (1996) and Chen
Zero-inflated models have become quite popular in the literature to deal with situations where there are excess zeros, leading to overdispersion in data (Consul and Jain, 1973; Van den Broek, 1995; del Castillo and Pérez-Casany, 2005; Yang
In this paper, we propose a new survival model induced by discrete frailty with ZIPS distribution, where zero frailty corresponds to a model containing a proportion of individuals no longer susceptible to the event of interest (long-term survivors). This model is more flexible than traditional cure rate models in regards to dispersion. Our proposal also allows a more realistic description of the nonrisk individuals. Individuals cured due to intrinsic factors are modeled by a deterministic fraction; however, those cured due to an intervention are modeled by a random fraction.
We develop a Bayesian analysis for proposed model based on Markov chain Monte Carlo (MCMC) methods. To test the adequacy of the proposed model we use a Bayesian methodology based on the pseudo-Bayes factor (PSBF) (Geisser and Eddy, 1979). Another important issue is to verify the assumptions of the fitted model and perform studies to detect possible influential or extreme observations, which may cause distortions in the analysis results. These studies are called case influence diagnostics (Cook and Weisberg, 1982) and are used to help researchers interpret a model by showing the effect of data points on parameter estimates. Thus, we believe that the development of case influence diagnostic Bayesian tools for the model proposed in this paper are a significant contribution. Our goal is to also develop diagnostic measures from a Bayesian point of view based on the
Our model is illustrated with a dataset on skin cancer. The data are part of a study on cutaneous melanoma for the evaluation of post-operative treatment performance as a method to prevent recurrence. The data suggest the existence of a fraction of individuals with zero-risk (no longer susceptible or cured). It is therefore of interest to estimate this zero-risk fraction; in addition, it would also be interesting to obtain the proportion of individuals who are cured due to genetic factors or other inherent characteristics.
The paper is organized as follows. In Section 2 we give the survival functions for the frailty model with discrete distributions. The proposed model formulation is presented in Section 3. In Section 4 we describe Bayesian computational strategies using a MCMC algorithm and discuss some measures of model selection as well as Bayesian case diagnostics based on
Frailty models provide a suitable method to consider the existence of possible associations and unobserved heterogeneity in survival models. A standard frailty model is often built on a proportional hazards framework conditional on a non-negative random variable (frailty)
where
where
In
where
Discrete distributions such as Poisson, binomial, geometric or negative binomial have recently been considered for the frailty models (Caroni
where
In this section, we derive the survival function of our frailty model with a cure fraction using a ZIPS distribution as given in
which converges when |
Since lim
We refer to the model in
Using
It can be observed that, since
The proportion of zero-risk given in
The corresponding probability density function (pdf) of the model (
and the hazard function is
where
Figure 1 illustrates some possible shapes of the survival and hazard functions of the ZIPS-CF model for some selected values of
The survival function of the individuals susceptible to the event of interest (or at-risk individuals), denoted by
It can be noted that
There is a mathematical relationship between the ZIPS-CF model and the standard mixture model (Boag, 1949; Berkson and Gage, 1952), which can be written as
where
The ZIPS class includes several important and known zero-inflated distributions, such as zero-inflated binomial (ZIB), ZIP, ZIG, ZIL, and ZINB. In this section we only present three special cases of the ZIPS-CF model, particularly when the frailty variable
Let us consider
From
where
Inferential methods are investigated from a Bayesian point of view. In this context, a prior distribution for common parameters are required; in addition, we also adopt proper prior distributions for all model parameters to ensure proper posterior distributions. As is typically carried out for regression parameters, normal priors are specified for
where the parameters
Combining the likelihood function in
The joint posterior density in
Note that the distributions given in (
A comparison of models for a given data set and selecting the model that best fits the data is an important issue in data analysis. Our interest is to test the adequacy of the ZIPS-CF model through the zero-risk parameter for data modeled by the fraction of deterministic risk. In particular, we are interested in testing the hypotheses
where denotes data with the
Then, PSBF_{01} < 1 provide evidence against of
where
A common diagnostic tool to assess the influence of an observation on the fit of a model is the case influence (or deletion) diagnostic (Cook and Weisberg, 1982). Case influence diagnostic is useful for case deletion, outlier detection, or model modification. In addition, it helps researchers interpret a model by showing the effect of data points on parameter estimates. Let
There is a mathematical relationship between the CPO defined in
where denotes the expected value with respect to joint posterior distribution and is the likelihood function of the
An estimate of (
According to Peng and Dey (1995) and Weiss (1996), the
In this section, we conduct a simulation study to evaluate the frequentist properties of Bayesian estimates of ZIPS-CF model parameters as well as verify the performance of PSBF to discriminate between two models. For simplicity, the study was based on the generation of a data set from the ZIP-CF model with aWeibull baseline distribution, for which the hazard function is given by
To express noninformative prior distributions in (
We have also conducted a simulation study based on the PSBF to test hypotheses
In this section we demonstrate an application of the proposed model described in Section 3 and the estimation procedure by using a real data set collected in a study on cutaneous melanoma in order to evaluate the treatment performance with a high dose of a certain drug (interferon alfa-2b) as a method to prevent recurrence. The sample included 417 patients without missing values who entered in the study from 1991 to 1995, and were followed up until 1998. This data set was analyzed and published by Kirkwood
Figure 2 presents the Kaplan-Meier estimate of the survival function and the total time on test plot (TTT-plot) (Sun and Kececloglu, 1999) for the melanoma data. The behavior of the Kaplan-Meier curve in Figure 2(a) clearly suggests the existence of a fraction of individuals with zero-risk, thus a model that ignores the possibility of a cure fraction is unsuitable to analyze these data. Moreover, the TTT-plot in Figure 2(b) indicates an increasing hazard function, so that the Weibull model can be an adequate alternative for baseline distribution.
Following Barral (2001), cutaneous melanoma is a common cancer of the skin whose incidence is dramatically increasing in persons with light-colored skin in all parts of the world. This disease is resistant to traditional chemotherapy and radiotherapy; therefore, malignant melanoma is a favorite target for alternative therapies that often involve immunological mechanisms. Taking into account this fact and the conclusions from the analysis of Figure 2, it is reasonable to assume that a proportion of the patients are cured (no longer susceptible) and it is also of interest to estimate the fraction of individuals cured due to treatment or therapy. Thus we applied the proposed ZIPS-CF model with a Weibull baseline distribution to cutaneous melanoma data. For our purposes, we link the parameter
with
Similar to the simulation study, the priors are specified as follows. Normal priors were specified for regression parameters:
Model comparison can be performed considering the PSBF. The Monte Carlo estimate of the PSBF of ZIG-CF model (
The criteria above shows that the ZIG-CF model is best. We then select the ZIG-CF model as our working model taking into account the PSBF and criteria in Table 4. For both models, we observe that, with the exception of covariates age (
In order to detect possible influential observations and verify the model robustness in the presence of these observations, we calculate the Monte Carlo estimates of the measure of divergence
The Bayesian estimate of the deterministic proportion of zero-risk,
Figure 4 shows the survival function stratified by nodule category (1 to 4) for patients between the ages 32 of 66 years, which correspond to quantiles of 10% and 90% of ages. From these plots, we can observe that the survival probability is seen to decrease more rapidly for older patients and the highest nodule category.
Finally, we analyze the role of the covariates on the proportion of zero-risk (cured fraction)
In this paper, we proposed a new model for accommodating long-term survival data obtained from a discrete frailty model. We have assumed a ZIPS distribution for the frailty variable, that allows accommodating possible overdispersion present in data. Another advantage of our model is that we can classify the proportion of individuals with zero-risk in two components (immune and cured), one is due to inherent characteristics of individuals (deterministic factors) and the other due to random factors, which is not possible in commonly used cure rate models. Moreover, the proposed ZIPS-CF model includes as particular cases some models of literature, such as the models proposed by Chen
One important component in this study is to use an appropriate MCMC method. The estimation of the parameters of the proposed model and corresponding inference is conducted by a MCMC algorithm in a Bayesian framework. This is advantageous because we can make statistical inferences based on the posterior quantities of the model, where it is often harder to obtain the standard error of estimated parameters through alternative methods. A widely used algorithm is the Gibbs sampler, which samples iteratively from posterior conditional distributions. However, the Metropolis algorithm should be employed when these conditional distributions are not standard distributions. Thus, we have performed Metropolis-Hasting steps within the Gibbs sampler to obtain samples of the parameters of interest. We also have developed case influence diagnostics for the joint posterior distribution based on the measure of divergence
Long-term survival function (
Model | |||
---|---|---|---|
ZIP-CF | |||
ZIG-CF | |||
ZIL-CF |
ZIP-CF = zero-inflated Poisson model with a cure fraction; ZIG-CF = zero-inflated geometric model with a cure fraction; ZIL-CF = zero-inflated logarithmic model with a cure fraction.
Posterior summaries for the parameters of the ZIP-CF model based on 1,000 simulated data sets
Scenario 1 | Scenario 2 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
100 | True values | 0.50 | 2.00 | 2.00 | 2.00 | −2.00 | 2.00 | 2.00 | 2.00 | 2.00 | −2.00 |
post. mean | 0.55 | 2.09 | 2.25 | 2.21 | −1.96 | 2.27 | 2.22 | 2.49 | 2.29 | −2.02 | |
post. sd. | 0.17 | 0.25 | 1.02 | 0.60 | 0.48 | 0.65 | 0.39 | 0.97 | 0.74 | 0.76 | |
bias | 0.05 | 0.09 | 0.25 | 0.21 | 0.04 | 0.27 | 0.22 | 0.49 | 0.29 | −0.02 | |
RMSE | 0.18 | 0.27 | 1.05 | 0.63 | 0.48 | 0.70 | 0.45 | 1.08 | 0.80 | 0.76 | |
CP | 0.97 | 0.93 | 0.96 | 0.97 | 0.94 | 0.96 | 0.91 | 0.99 | 0.98 | 0.94 | |
200 | True values | 0.50 | 2.00 | 2.00 | 2.00 | −2.00 | 2.00 | 2.00 | 2.00 | 2.00 | −2.00 |
post. mean | 0.54 | 2.05 | 2.16 | 2.12 | −1.99 | 2.19 | 2.08 | 2.23 | 2.22 | −1.96 | |
post. sd. | 0.12 | 0.17 | 0.78 | 0.40 | 0.32 | 0.47 | 0.26 | 0.96 | 0.58 | 0.53 | |
bias | 0.04 | 0.05 | 0.16 | 0.12 | 0.01 | 0.19 | 0.08 | 0.23 | 0.22 | 0.04 | |
RMSE | 0.13 | 0.18 | 0.79 | 0.42 | 0.32 | 0.50 | 0.27 | 0.99 | 0.62 | 0.53 | |
CP | 0.95 | 0.94 | 0.94 | 0.97 | 0.94 | 0.97 | 0.94 | 0.98 | 0.98 | 0.93 | |
400 | True values | 0.50 | 2.00 | 2.00 | 2.00 | −2.00 | 2.00 | 2.00 | 2.00 | 2.00 | −2.00 |
post. mean | 0.53 | 2.02 | 2.06 | 2.06 | −1.99 | 2.13 | 2.05 | 2.03 | 2.22 | −1.96 | |
post. sd. | 0.08 | 0.12 | 0.51 | 0.25 | 0.23 | 0.35 | 0.17 | 0.78 | 0.44 | 0.33 | |
bias | 0.03 | 0.02 | 0.06 | 0.06 | 0.01 | 0.13 | 0.05 | 0.03 | 0.22 | 0.04 | |
RMSE | 0.09 | 0.13 | 0.51 | 0.26 | 0.23 | 0.38 | 0.17 | 0.78 | 0.49 | 0.33 | |
CP | 0.97 | 0.94 | 0.95 | 0.97 | 0.92 | 0.95 | 0.95 | 0.92 | 0.97 | 0.94 |
ZIP-CF = zero-inflated Poisson model with a cure fraction; post. mean = posterior mean of the parameter estimates; post. sd. = the posterior standard deviation; RMSE = root mean squared error; CP = coverage probabilities.
Percentage of times that
100 | 200 | 300 | 400 | |
---|---|---|---|---|
0.0 | 12.1 | 11.2 | 10.9 | 10.2 |
0.5 | 84.4 | 98.8 | 99.0 | 100.0 |
2.0 | 85.2 | 97.0 | 100.0 | 100.0 |
Bayesian criteria for the two fitted models, namely ZIP-CF and ZIG-CF
Criteria | Model | |||
---|---|---|---|---|
ZIP-CF | ZIG-CF | P-CF | G-CF | |
DIC | 1032.037 | 1029.056 | 1038.546 | 1045.527 |
EAIC | 1042.295 | 1039.814 | 1047.360 | 1057.103 |
EBIC | 1082.625 | 1080.145 | 1083.658 | 1093.401 |
ZIP-CF = zero-inflated Poisson model with a cure fraction; ZIG-CF = zero-inflated geometric model with a cure fraction; DIC = deviance information criterion; EAIC = expected Akaike information criterion; EBIC = expected Bayesian information criterion.
Posterior summaries of the parameters for the fitted ZIG-CF model for the cutaneous melanoma data
Parameter | Estimates | Standard deviation | 90% HPD interval | |
---|---|---|---|---|
Mean | Median | |||
0.382 | 0.381 | 0.174 | (0.099; 0.664) | |
2.056 | 2.053 | 0.157 | (1.808; 2.317) | |
0.069 | 0.066 | 0.029 | (0.018; 0.113) | |
−1.263 | −1.341 | 0.744 | (−2.447; −0.160) | |
0.017 | 0.017 | 0.009 | (0.001; 0.031) | |
0.622 | 0.621 | 0.117 | (0.427; 0.808) |
ZIG-CF = zero-inflated geometric model with a cure fraction; HPD = highest posterior density.
Values of
Model | Divergence measures | |||
---|---|---|---|---|
ZIP-CF | 0.262 | 0.561 | 0.287 | 0.608 |
ZIG-CF | 0.161 | 0.341 | 0.224 | 0.341 |
ZIP-CF = zero-inflated Poisson model with a cure fraction; ZIG-CF = zero-inflated geometric model with a cure fraction.
Bayesian criteria for the ZIG-CF and G-CF models.
Criteria | Model | |
---|---|---|
ZIG-CF | G-CF | |
DIC | 1023.458 | 1026.301 |
EAIC | 1030.344 | 1031.338 |
EBIC | 1054.543 | 1051.504 |
ZIG-CF = zero-inflated geometric model with a cure fraction; DIC = deviance information criterion; EAIC = expected Akaike information criterion; EBIC = expected Bayesian information criterion.
Posterior summary of the cured fraction stratified by nodule category and patient age
Age | Nodule category | Estimate | Standard deviation | 90% HPD interval | |
---|---|---|---|---|---|
Mean | Median | ||||
32 | 1 | 0.6573 | 0.6661 | 0.0732 | (0.5429; 0.7787) |
2 | 0.5549 | 0.5569 | 0.0553 | (0.4686; 0.6470) | |
3 | 0.4644 | 0.4650 | 0.0421 | (0.3999; 0.5366) | |
4 | 0.3957 | 0.3965 | 0.0459 | (0.3169; 0.4656) | |
66 | 1 | 0.5652 | 0.5667 | 0.0582 | (0.4795; 0.6718) |
2 | 0.4718 | 0.4708 | 0.0432 | (0.3991; 0.5414) | |
3 | 0.3995 | 0.3998 | 0.0437 | (0.3287; 0.4712) | |
4 | 0.3497 | 0.3515 | 0.0557 | (0.2604; 0.4378) |
HPD = highest posterior density.