Bivariate distributions play a fundamental role in survival and reliability studies. We consider a regression model for bivariate survival times under right-censored based on the bivariate Kumaraswamy Weibull (Cordeiro
Statistical applications for the time of occurrence of an event of interest generally use the exponential, Weibull, gamma, log-normal and log-logistic distributions. However, there has been growing interest in new distributions to model skewness, kurtosis and different types of hazard rates. Among them, we cite the exponentiated Weibull (Mudholkar
In practice, regressor variables associated with the response variable of each observation are always presented; however, statistical analysis always gives consideration to models that account for all the information existing in the observations. Due to these factors, this work extends the well-known KwW distribution to include covariates by means of the scale parameter leading to the BKwW regression model. The inferential part is conducted using the asymptotic distribution of the maximum likelihood estimators (MLEs) subject to restrictions on parameters. To implement this method, we use the adjusted barrier function (Lange, 1999). Situations with small samples may present difficult results to justify. We explore the use of a Bayesian method as an alternative to classic analysis. Markov chain Monte Carlo (MCMC) methods are used to develop a Bayesian analysis for the regression model.
After fitting the data, it is important to check model assumptions and conduct a robustness study to detect influential or extreme observations that can cause distortions in the results of the analysis. Influence diagnostics is an important step in the analysis of a data set, since it provides an indication of bad model fit or influential observations. Cook (1986) proposed a diagnostic approach named local influence to assess the effect of small perturbations in the model and/or data on parameter estimates. Several authors have applied the local influence method in more general regression models than the normal regression model. Some authors have also investigated the assessment of local influence in survival analysis models. For instance, Ortega
The article is organized as follows. In Section 2, we present the BKwW regression model and some properties of the marginal distributions. In Section 3, we examine the performance of the likelihood function by computing the maximum likelihood, while the estimated equations are considered under parameter constraints and derive several diagnostic measures by considering the normal curvatures of local influence under various perturbation schemes. In Section 4, we consider a Bayesian approach and influence diagnostics for the BKwW regression model. In Section 5, we conduct various simulation studies to evaluate the behavior of the estimators in the BKwW regression model. In Section 6, we present a reanalysis of the dataset from patients of a renal insufficiency study reported by McGilchrist and Aisbett (1991). Finally, Section 7 provides some conclusions remark.
In practice, the majority of studies involve covariates related to survival times. Regression models can be formulated in various ways. In survival analysis, the class of parametric regression models and Cox regression model are well-known. However, we adopt a reparameterization of the BKwW (Cordeiro
where
Let
where
with probability density function (pdf) given by
where the functions
To illustrate some possible shapes of the pdf, we set the parameter values
The corresponding marginal pdfs
and
respectively, for
If |
holds, where the binomial coefficient is defined for any real number. Let
where
The quantile function of
where
The survival function associated with (
where the functions
The marginal survival function of
Let (
The density function
The log-likelihood function (
where
The MLEs under constraints on the parameters in
Under standard regularity conditions, the asymptotic distribution of (
The approximate multivariate normal
Besides estimation, hypothesis tests is another key issue. Let
The first tool to perform sensitivity analysis, is by means of global influence starting from case deletion (Cook, 1977), which is a common approach to study the effect of dropping the
Accordingly, a quantity with subscript “(
Let
So, we can assess the values of GD
Since regression models are sensitive to the underlying model assumptions, performing sensitivity analysis in general is strongly advisable. Another approach is suggested by Cook (1986), where instead of removing observations, weights are given to them. Local influence calculation can be conducted for model (
Consider the vector of weights
where
In this section, we consider the Bayesian method as an alternative analysis to incorporate previous knowledge of the parameters through informative prior density functions.
Let (
We consider that
where
Further, we assume the following prior distributions
By combining the likelihood function (exponential of the log-likelihood (
To implement the Metropolis-Hastings algorithm, we proceed as follows.
Start with any point
Generate a point
Update
Repeat Steps (2) and (3) by increasing the stage indicator until the process has reached a stationary distribution.
All computations are performed in R software (R Development Core Team, 2016). In all the work, after 20,000 sample burn-in, we use every tenth sample from the 60,000 MCMC posterior samples to reduce the autocorrelations and yield better convergence results, thus obtaining an effective sample of size 6,000 upon which the posterior is based on. We monitor the convergence of the Metropolis-Hasting algorithm using the method proposed by Geweke (1992) as well as trace plots.
A variety of methodologies can be applied for comparing several competing models for a given dataset and selecting those which provide the best fits to the data. We consider some of the Bayesian model selection criteria, namely, the deviance information criterion (DIC) proposed by Spiegelhalter
Similarly, the EAIC and EBIC criteria can be estimated by
Other criteria such as LPML is derived from conditional predictive ordinate (CPO) statistics. Let denote the data with the deleted
The CPO
For model comparisons, we use the log pseudo marginal likelihood (LPML) defined by
It is well known that regression models may be sensitive to underlying model assumptions. Therefore, a sensitivity analysis is strongly advisable. Cook (1986) uses this idea to motivate his assessment of influence analysis and suggests that more confidence should be put in a model relatively stable under small modifications. In order to investigate if some of the observations are influential for the Bayesian analysis, we consider the Bayesian case-deletion influence diagnostic measures for the joint posterior distribution based on the
Let
where
Let
where
Note that
where
It is not difficult to see for the divergence measures considered that
A simulation study is conducted to evaluate the parameter estimates for the proposed model. The results are obtained from 1,000 Monte Carlo simulations using the R software. In each replication, a random sample of size
The samples denoted by (
Step 0: Set up the sample size
Step 1: Generate the covariate
Step 2: Use the quantile function given in (
where
Step 3: Compare
Step 4: Generate the covariate
Step 5: Next,
Step 6: Compare
Step 7: Do
The simulation study is performed for
Tables 1 and 2 provide the averages (Mean), biases and the mean square errors (MSEs) of the MLEs and Bayesian estimates of the parameters in the BKwW regression model, respectively. We can note that the MSE values decrease when the sample size increases in agreement with first-order asymptotics.
A goal of this study is to show the need for robust models to deal with the presence of outliers in the data. We generate a sample of length 200 with fixed parameters
We select cases 35, 95, and 132 for perturbation. The perturbation scheme is structured as following. To create influential observations artificially in the dataset, we choose one, two or three of these selected cases. For each case, we perturb one or both lifetimes as follows
In this study, we consider eight setups. Dataset a: original dataset without outliers; Dataset b: data with outlier 35; Dataset c: data with outlier 95; Dataset d: data with outlier 132; Dataset e: data with outliers 35 and 95; Dataset f: data with outliers 35 and 132; Dataset g: data with outliers 95 and 132; and dataset h: data with outliers 35, 95, and 132. The MCMC computations are made similar to those in the last subsection using the Geweke’s convergence diagnostic (Geweke, 1992) to monitor the convergence of the Gibbs samples as well as trace plots.
Table 3 reveals that the posterior inferences about the parameters are sensitive to the perturbation of the selected case(s), except the
We consider the sample from the posterior distributions of the parameters of the BKwW regression model to calculate the
Figures 2
In this section, we perform a statistical analysis on the renal insufficiency study originally presented by McGilchrist and Aisbett (1991) using the BKwW regression model. Recently, these data are analyzed by Barriga
We perform an exploratory analysis, mainly considering the variables referring to the occurrence times of the two events of interest. The estimated Kendall correlation coefficient is
Based on the results of Sections 2 and 3, Table 6 lists the MLEs of the parameters for the BKwW regression model, their standard errors and
After modeling, it is important to verify whether there are observations that influence the fit of the BKwW regression model. To investigate this, we calculate global influence measures, likelihood distance (LD
We apply the local influence theory developed in Section 3.2, where case-weight perturbation is used, and obtain the value of the maximum curvature
The influence of perturbations on the observed survival times is now analyzed (response variable perturbation). The value of the maximum curvature is
We conclude that the diagnostic analysis (global influence and local influence) detect as potentially influential observations, the following two cases: ♯21 and ♯29. The observation ♯21 corresponds to a male with a longer time to first occurrence of the infection and a median time to occurrence of infection 2. The observation ♯29 is related to a male with less time until the occurrence of infection and a low time until the occurrence of infection 2. In order to reveal the impact of the two observations on the parameter estimates, we refit the model under some situations. First, we individually eliminate each one of these two observations. Next, we remove from the original dataset the totality of potentially influential observations.
Table 7 gives the relative changes (in percentages) of the estimates defined by RC
The figures in Table 7 reveal that there are substantial changes in the estimated parameter values as well as changes in the set of parameters that show the model’s significance. Despite this sensitivity of the model, inclusion and exclusion of these points do not imply changes in the interpretation of the results, since there is no change in the sign of the coefficient referring to the sex variable, which indicates that men tend to suffer infections sooner than women.
Table 8 provides the means, standard deviations (SDs) and 95% Bayesian credible intervals (CI (95%)) for the estimates of the parameters in the BKwW regression model fitted to the original dataset and when two observations (♯15 and ♯21) are dropped.
The sample considered for the posterior distributions of the parameters of the BKwW regression model and the
Also, we obtain the DIC, EAIC, EBIC, and LPML values to compare the impact of the influential points. Table 10 presents the values for the complete data, without observation ♯15, ♯21 and both of two cases. By dropping the observation ♯21, we obtain a better fit compared to the case without the observation ♯15.
We check the quality of the fitted regression model by means of plots of the Kaplan-Meier survival function and the estimated marginal survival functions for the BKwW regression model. Figure 11 shows that the proposed model is well adjusted, because its fitted survival function follows the Kaplan-Meier curve closely.
We propose the BKwW regression model as an extension of the bivariate Weibull distribution. We estimate the model parameters using the maximum likelihood methodology subject to linear restrictions in the parameters. We also perform a sensitivity analysis to assess the robustness of the results. We carry out the estimation method using the R software. Several functions are implemented as well as new functions with the partial derivatives of the log-likelihood function.
The optimization process is sensitive to the choice of the initial values used in the algorithms. To choose the initial regression parameters, we obtain the estimates considering marginal models for each response variable, while we choose the values
Expressions for
and
Mean estimates, Bias and MSEs of the MLEs of the parameters in the bivariate Kumaraswamy Weibull regression model
Parameter (true value) | |||||||||
---|---|---|---|---|---|---|---|---|---|
Mean | Bias | MSE | Mean | Bias | MSE | Mean | Bias | MSE | |
0.8291 | 0.0791 | 0.0153 | 0.8265 | 0.0765 | 0.0105 | 0.8237 | 0.0737 | 0.0085 | |
1.4484 | 0.4484 | 0.2878 | 1.3876 | 0.3876 | 0.1908 | 1.3754 | 0.3754 | 0.1655 | |
4.4953 | 0.4953 | 0.3299 | 4.5588 | 0.5588 | 0.3568 | 4.5528 | 0.5528 | 0.3350 | |
0.7097 | −0.0403 | 0.1078 | 0.6589 | −0.0911 | 0.0639 | 0.6473 | −0.1027 | 0.0484 | |
1.5893 | 0.3393 | 0.2878 | 1.5172 | 0.2672 | 0.1908 | 1.5064 | 0.2564 | 0.1655 | |
2.8722 | −0.3778 | 0.1935 | 2.8918 | −0.3582 | 0.1549 | 2.9043 | −0.3457 | 0.1355 | |
1.7191 | −0.0309 | 0.0742 | 1.7039 | −0.0461 | 0.0386 | 1.6916 | −0.0584 | 0.0274 | |
0.5705 | −0.4295 | 0.2116 | 0.5710 | −0.4290 | 0.1977 | 0.5669 | −0.4331 | 0.1963 | |
0.9049 | 0.1549 | 0.0707 | 0.9256 | 0.1756 | 0.0517 | 0.9127 | 0.1627 | 0.0413 |
MSE = biases and the mean square error; MLE = maximum likelihood estimator.
Mean estimates, biases and MSEs of the Bayesian estimates of the parameters in the bivariate Kumaraswamy Weibull regression model
Parameter (true value) | |||||||||
---|---|---|---|---|---|---|---|---|---|
Mean | Bias | MSE | Mean | Bias | MSE | Mean | Bias | MSE | |
0.9027 | 0.1527 | 0.0433 | 0.8615 | 0.1115 | 0.0263 | 0.8466 | 0.0966 | 0.0187 | |
1.4323 | 0.4323 | 0.3582 | 1.3811 | 0.3811 | 0.2231 | 1.3689 | 0.3689 | 0.1859 | |
4.5309 | 0.5309 | 0.4626 | 4.5699 | 0.5699 | 0.4057 | 4.5582 | 0.5582 | 0.3619 | |
0.7189 | −0.0311 | 0.2439 | 0.6658 | −0.0842 | 0.1183 | 0.6542 | −0.0958 | 0.0815 | |
1.5642 | 0.3142 | 0.3582 | 1.5061 | 0.2561 | 0.2231 | 1.4923 | 0.2423 | 0.1859 | |
2.8727 | −0.3773 | 0.2455 | 2.8966 | −0.3534 | 0.1716 | 2.9022 | −0.3478 | 0.1515 | |
1.7407 | −0.0093 | 0.1619 | 1.7200 | −0.0300 | 0.0731 | 1.6987 | −0.0513 | 0.0493 | |
0.5730 | −0.4270 | 0.2515 | 0.5732 | −0.4268 | 0.2140 | 0.5645 | −0.4355 | 0.2084 | |
0.9153 | 0.1653 | 0.2987 | 0.9190 | 0.1690 | 0.1058 | 0.8973 | 0.1473 | 0.0559 |
MSE = biases and the mean square error; MLE = maximum likelihood estimator.
Posterior means and standard deviations (SDs) for the bivariate Kumaraswamy Weibull regression model according to different perturbation schemes
Dataset | Perturbed case | a | b | |||||||
---|---|---|---|---|---|---|---|---|---|---|
Mean (SD) | Mean (SD) | Mean (SD) | Mean (SD) | Mean (SD) | Mean (SD) | Mean (SD) | Mean (SD) | Mean (SD) | ||
a | None | 0.6508 | 3.5380 | −2.2298 | −3.6836 | 1.3794 | −2.6428 | −3.3939 | 0.8366 | 0.6921 |
(0.0578) | (0.3078) | (0.0458) | (0.0525) | (0.1332) | (0.1139) | (0.1386) | (0.1470) | (0.0917) | ||
b | 35 | 0.6238 | 3.0476 | −2.2270 | −3.7769 | 1.2228 | −2.6994 | −3.5492 | 0.9812 | 0.6296 |
(0.0636) | (0.2550) | (0.0557) | (0.0542) | (0.1147) | (0.1314) | (0.1447) | (0.1800) | (0.0881) | ||
c | 95 | 0.6244) | 3.0751 | −2.2256 | −3.7723 | 1.2334 | −2.6925 | −3.5439 | 0.9677 | 0.6288 |
(0.0610) | (0.2511) | (0.0532) | (0.0544) | (0.1124) | (0.1282) | (0.1432) | (0.1709) | (0.0879) | ||
d | 132 | 0.4392 | 2.5259 | −2.2664 | −3.6404 | 0.8930 | −2.6394 | −3.3198 | 1.5083 | 1.1779 |
(0.0487) | (0.2528) | (0.0592) | (0.0606) | (0.0827) | (0.1666) | (0.1710) | (0.2647) | (0.1709) | ||
e | {35 95} | 0.6121 | 2.8289 | −2.2190 | −3.8397 | 1.1567 | −2.7151 | −3.6568) | 1.0416 | 0.5838 |
(0.0640) | (0.2238) | (0.0606) | (0.0565) | (0.1060) | (0.1367) | (0.1509) | (0.1908) | (0.0829) | ||
f | {35, 132} | 0.3922 | 2.1309 | −2.2725 | −3.7082 | 0.7735 | −2.7021 | −3.4645 | 1.8419 | 1.2165 |
(0.0460) | (0.2052) | (0.0712) | (0.0615) | (0.0700) | (0.1956) | (0.1741) | (0.3199) | (0.1852) | ||
g | {95, 132} | 0.3958 | 2.1574 | −2.2769 | −3.7204 | 0.7797 | −2.7047 | −3.5108 | 1.8169 | 1.1387 |
(0.0458) | (0.2071) | (0.0718) | (0.0586) | (0.0694) | (0.1908) | (0.1730) | (0.3232) | (0.1637) | ||
h | {35, 95, 132} | 0.3803 | 2.0017 | −2.2713 | −3.7651 | 0.7451 | −2.7274 | −3.5925 | 1.9430 | 1.1302 |
(0.0456) | (0.1878) | (0.0799) | (0.0641) | (0.0672) | (0.2074) | (0.1747) | (0.3573) | (0.1907) |
Bayesian criteria for each perturbed version by fitting the bivariate Kumaraswamy Weibull regression model according to different perturbation schemes
Data names | Bayesian criteria | |||
---|---|---|---|---|
EAIC | EBIC | DIC | LPML | |
a | −2740.684 | −2710.999 | −2750.117 | 1374.758 |
b | −2717.665 | −2687.981 | −2726.491 | 1361.560 |
c | −2718.284 | −2688.599 | −2727.412 | 1362.171 |
d | −2693.735 | −2664.050 | −2703.276 | 1344.087 |
e | −2700.677 | −2670.993 | −2709.585 | 1353.458 |
f | −2666.795 | −2637.110 | −2675.774 | 1330.980 |
g | −2665.915 | −2636.230 | −2675.220 | 1332.170 |
h | −2650.958 | −2621.273 | −2660.200 | 1324.159 |
EAIC = expected Akaike information criterion; EBIC = expected Bayesian information criterion; DIC = deviance information criterion; LPML = log pseudo marginal likelihood.
Case influence diagnostics for the simulated data
Data names | Case number | |||
---|---|---|---|---|
a | 35 | 0.0048 | 0.0096 | 0.0387 |
95 | 0.0013 | 0.0025 | 0.0198 | |
132 | 0.3832 | 0.7914 | 0.3426 | |
b | 35 | 2.0466 | 4.3206 | 0.7223 |
c | 95 | 1.7511 | 3.6743 | 0.6782 |
d | 132 | 8.2482 | 15.1319 | 0.9833 |
e | 35 | 0.7872 | 1.6328 | 0.4880 |
95 | 1.0060 | 2.0804 | 0.5427 | |
f | 35 | 1.1026 | 2.4538 | 0.5679 |
132 | 6.7799 | 13.475 | 0.9618 | |
g | 95 | 1.1436 | 2.4212 | 0.5735 |
132 | 5.3909 | 9.9707 | 0.9179 | |
h | 35 | 0.5931 | 1.3437 | 0.4240 |
95 | 1.1146 | 3.1001 | 0.5759 | |
132 | 5.3933 | 10.4104 | 0.9240 |
MLEs for the bivariate Kumaraswamy Weibull regression model based on the renal insufficiency data
Parameter | Estimate | Standard error | |
---|---|---|---|
2.5204 | 1.6159 | 0.1188 | |
0.7075 | 0.4981 | 0.1555 | |
1.7274 | 1.2462 | 0.1657 | |
1.9164 | 0.3659 | <0.0001 | |
1.6826 | 0.8100 | - | |
1.4891 | 0.6823 | - | |
2.9146 | 1.1558 | - | |
0.5782 | 0.1739 | - | |
0.9999 | 0.3743 | - |
MLEs = maximum likelihood estimators.
Relative changes, estimates, and the corresponding
Set | ||||
---|---|---|---|---|
[−] | [1.692] | [6.9591] | [0.0000] | |
2.5204 | 2.4777 | 2.3450 | 2.5204 | |
(0.1188) | (0.0846) | (0.1649) | (0.0213) | |
[−] | [−67.7746] | [25.0070] | [0.0000] | |
0.7075 | 1.1871 | 0.5306 | 0.7075 | |
(0.1555) | (0.0291) | (0.2141) | (0.1854) | |
[−] | [−10.9615] | [29.7890] | [0.0000] | |
1.7274 | 1.9168 | 1.2128 | 1.7274 | |
(0.1657) | (0.1098) | (0.4268) | (0.0421) | |
[−] | [−1.9063] | [−0.8373] | [0.0000] | |
1.9164 | 1.9529 | 1.9324 | 1.9164 | |
(0.0000) | (0.0000) | (0.0000) | (0.0000) | |
[−] | [9.1365] | [−4.1007] | [0.0000] | |
1.6826 | 1.5289 | 1.7516 | 1.6826 | |
[−] | [0.3287] | [−14.7761] | [0.0000] | |
1.4891 | 1.4842 | 1.7091 | 1.4891 | |
[−] | [6.1817] | [−41.5784] | [0.0000] | |
2.9146 | 2.7344 | 4.1264 | 2.9146 | |
[−] | [−31.6078] | [5.2288] | [0.0000] | |
0.5782 | 0.7609 | 0.5479 | 0.5782 | |
[−] | [0.0000] | [0.0000] | [0.0000] | |
0.9999 | 1.0000 | 1.0000 | 1.0000 |
Case influence diagnostics for renal insufficiency data
Parameter | Complete data set | Dropped observations 15 and 21 | ||||||
---|---|---|---|---|---|---|---|---|
Mean | SD | CI (95%) | Mean | SD | CI (95%) | |||
0.9989 | 0.0142 | 0.9999 | 1.0000 | 0.9992 | 0.0106 | 0.9998 | 1.0000 | |
0.9572 | 0.1867 | 0.6394 | 1.3806 | 1.1916 | 0.2685 | 0.7447 | 1.7832 | |
4.1414 | 0.4844 | 3.1348 | 5.0698 | 3.8239 | 0.4435 | 2.9181 | 4.6951 | |
0.7127 | 0.4501 | −0.1110 | 1.6324 | 0.9632 | 0.4367 | 0.1139 | 1.8319 | |
1.0956 | 0.1828 | 0.7657 | 1.4866 | 1.1331 | 0.2079 | 0.7687 | 1.5826 | |
3.1853 | 0.3614 | 2.4386 | 3.8682 | 3.0973 | 0.3728 | 2.3768 | 3.8408 | |
1.8701 | 0.3513 | 1.1885 | 2.5791 | 1.9809 | 0.3921 | 1.1779 | 2.7060 | |
0.8438 | 0.2722 | 0.4398 | 1.4704 | 0.8618 | 0.2862 | 0.4328 | 1.5458 | |
0.6876 | 0.1340 | 0.4161 | 0.9312 | 0.8132 | 0.1433 | 0.5117 | 1.0495 |
SD = standard deviation; CI = credible intervals.
Case influence diagnostics for the renal insufficiency data
Case number | |||
---|---|---|---|
15 | 0.5722 | 1.3869 | 0.4380 |
21 | 1.7783 | 4.1576 | 0.7093 |
Bayesian criteria
Dropped observation | Criterion | |||
---|---|---|---|---|
EAIC | EBIC | DIC | LPML | |
None | 683.4028 | 698.1411 | 673.4582 | −337.3232 |
15 | 664.0772 | 678.5755 | 654.2425 | −327.3600 |
21 | 647.4933 | 661.9915 | 637.0273 | −318.9128 |
15 and 21 | 628.4176 | 642.6693 | 618.2174 | −309.4217 |
EAIC = expected Akaike information criterion; EBIC = expected Bayesian information criterion; DIC = deviance information criterion; LPML = log pseudo marginal likelihood.