This study compares two generalized Lindley distributions and assesses consistency between theoretical and analytical results. Data (complete and censored) assumed to follow the Lindley distribution are generated and analyzed using two generalized Lindley distributions, and maximum likelihood estimates of parameters from the generalized distributions are obtained. Size and power of tests of hypotheses on the parameters are assessed drawing on asymptotic properties of the maximum likelihood estimates. Results suggest that whereas size of some of the tests of hypotheses based on the considered generalized distributions are essentially α-level, some are possibly not; power of tests of hypotheses on the Lindley distribution parameter from the two distributions differs.
The Lindley distribution (Lindley, 1958) was introduced in connection with fudicial distribution and Bayes theorem, but it has not seen much utility in the analyses of time-to-event data until the past decade. Its mathematical properties and real data applications were recently explored by Ghitany et al. (2008). Since then, in attempts to enhance its flexibility, many authors have considered some generalizations of the Lindley distribution (see for example, Zakerzadeh and Dolati (2009) and Nadarajah et al. (2011)).
Generalization entails the addition of parameter(s) to the Lindley distribution and specification of conditions under which the generalized distribution reduces to the Lindley distribution. Theoretical results suggest that, under certain conditions, a generalized Lindley distribution reduces to the Lindley distribution with the same scale parameter. But these results provide no practical guideline on how the generalization affects the parameter estimate of the resulting Lindley distribution. Additionally, the results provide no possible implications of generalization on size and power of tests on parameter(s).
Generalized distributions have more parameters than their parent distributions. Distributions with more parameters, however, have fewer residual degrees of freedom than those with fewer parameters. Such models (models are used interchangebly with distributions in this study) are often compared using approaches, such as, the Akaike information criteria (AIC; Akaike, 1974) and Bayesian information criteria (BIC; Schwarz, 1978), that take this estimation penalty into consideration. The thrust behind the use of these comparison approaches is to compare non-nested models based on their log-likelihood but with penalty for the number of parameters in the respective models. In some settings, selecting best model based on AIC and BIC could be challenging since they are not always in agreement (Rizopoulos, 2012).
Tests of hypotheses on parameters are commonly accomplished through methods that draw on either likelihood ratio theory or asymptotic properties of parameter estimates. It may worth the effort to investigate how generalization of the Lindley model possibly affects the estimate of the parameter. Because different approaches to generalization may have their own associated uncertainty (enthropy) and variability. Results from this investigation provided us with a tool for evaluating two generalized Lindley distributions and for assessing consistency between their theoretical and analytical results.
This study is motivated by the need to select between the presented generalized Lindley models to fit a data set. To the best of our knowledge, we are not aware of any study that assessed the performance of the presented generalized Lindley models based on the property of their parameter estimates, nor are we aware of any study that considered these models in regard to censored observations.
In this article, using a Monte Carlo simulation technique, we (1) assess size and power of tests of hypotheses on parameters when modeling Lindley-distributed data using two generalized Lindley distributions with censored and non-censored data; and (2) contrast results of size and power of tests from the two generalized Lindley distributions.
Denote by θ the scale parameter; the Lindley probability density function (pdf) is
The corresponding cumulative distribution and survival functions are
and
respectively. Suppose X_{1}, X_{2}, …, X_{n} is a random sample from the Lindley distribution, the estimator for both maximum likelihood estimate (MLE) and method of moments estimate of the parameter θ is
See Ghitany et al. (2008) for detailed mathematical characteristics of the Lindley distribution.
In attempts to enhance flexibility of the Lindley distribution, several authors considered different generalizations of the Lindley model. Zakerzadeh and Dolati (2009) considered the pdf
For ψ = γ = 1,
Nadarajah et al. (2011) considered the pdf
The corresponding distribution and survival functions of
and
respectively. For φ = 1,
Let the observed and censored event times be t_{i} (i = 1, 2, …, d) and T_{k} (k = 1, 2, …, c = n ? d), respectively. The log-likelihood for such a sample is
where f(x; θ, η) = h(x; θ, η)S (x; θ, η) denote the density function, and S (x; θ, η) the survival function. The MLEs of the parameters may be found iteratively. The asymptotic covariance matrix of the estimators is approximately
where η = η_{1}, η_{2} …, η_{p}, and t = p + 1.
For density in
and
Let V?(Θ?) be the estimated covariance matrix of Θ?. The statistics, based on asymptotic properties of Θ?, for testing the H_{0} in
where L is a matrix of known coefficients. Under H_{0}, T is asymptotically distributed as chi-square with r degrees of freedom, written
The data sets used in assessing power are same as those used in assessing size. However, in assessing power, we produced a new hypotheses by adding 20% to the parameter values used in generating the data sets. This is because in the application setting that warranted present study, the aim is to ensure that any of the generalized model selected should be able to detect at least a 20% change in Lindley parameter value to which the generalized models reduce to while maintaining false positive rates within nominal 5% value, as we considered a change of at least 20% a meaningful minimum detectable difference.
Failure to reject the hypothesis in
Data are simulated under the assumption that they follow the Lindley distribution with parameter θ, using the method described by Peace (1976), Peace and Flora (1978), Bender et al. (2005), Qian et al. (2010) and Okwuokenye (2012). Briefly, suppose U_{i} (i = 1, 2 …, n) are random numbers, times-to-event (t_{i}) may be obtained by solving the following equation of the cumulative hazard H(t_{i}) for t_{i}:
where h_{0} is the hazard rate function, and U_{i} is a uniform random variable. The t_{i} for the Lindley model are obtained by solving for t_{i} in the non-linear equation that results from replacing h_{0} in
Times-to-event are generated by specifying the Lindley parameter (θ = 1.5) to mirror the data set to be analyzed by the authors. It is assumed that a sample of N independent, observable survival times (t_{1}, t_{2}, …, t_{N}) are available on a population; of those, d (d ≤ N) are the observed times of the event, and the remaining k = N?d are censored at the end of the study. The distribution of the 20% censored data are assumed to follow the Lindley distribution.
In assessing size and power of tests of hypotheses, m = 5,000 independent data sets are generated for each sample size (n). Quasi-Newton method, with 10^{?8} convergence criteria, is used to obtain the maximum likelihood estimates prior to assessing the size and power of tests. Simulations and analyses are performed using SAS/STAT^{?} software version 9.3 of SAS system for Windows. Copyright ? 2011 SAS Institute Inc. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc., Cary, NC, USA.
Noted in Section 3 are the parameter values for assessing size of tests based on the two generalized Lindley distributions. The θ value for assessing size of tests is the parameter value used in generating
Tables 1
This study assesses size and power of tests of hypotheses on parameters based two generalized Lindley distributions, arising from the maximum likelihood estimates of those parameters. For complete data, tests based on the two distributions, for the hypotheses H_{0} : θ = θ_{0} are α-level tests, as are those based on
where Z ~ N(0, 1). For n = 500 and m = 5, 000, the estimated size is 12.34%; accordingly, z = 15.78 (p < 0.0001).
For power of tests of the hypotheses H_{0} : θ = θ_{0},
Of note is that convergence problems were encountered for some replicates when fitting model due to Zakerzadeh and Dolati (2009) to the generated Lindley-distributed data sets. For sample sizes of 25 and 50, 4,998 out of 5,000 replicates converged. Accordingly, the reported estimate of size and power of tests for the aforementioned cases were based on the above number of replicates in which optimization converged.
For data with 20% censored observations, the assessment of size and power of tests based on the model in
For model in
Results from this study suggest that size of tests of hypotheses on parameter(s) are α-level for the generalized Lindley model in
Size of tests on parameter(s) based on generalized Lindley distributions using complete data
H_{0} : θ = θ_{0} | |||||||
---|---|---|---|---|---|---|---|
# Replicates | Model | Sample size (n) | |||||
25 | 50 | 100 | 250 | 500 | 1000 | ||
5,000 | Model A | 5.32 | 5.22 | 4.68 | 5.10 | 5.28 | 5.10 |
Model B | 1.90 | 2.08 | 1.80 | 2.66 | 3.50 | 3.50 |
Note: Values of α?_{s} = 100 × T_{s}_{·}/m s (s = A, B), where T_{s}_{·} is the number of times the tests rejected H_{0} out of m replicates for different sample sizes with complete data. Tests are based on Wald chi-square and are run at α = 0.05; A and B denote models due to Nadarajah et al. (2011) and Zakerzadeh & Dolati (2009), respectively.
Power of tests on parameter(s) based on generalized Lindley distributions using complete data
H_{0} : θ = θ_{0} + Δ | |||||||
---|---|---|---|---|---|---|---|
# Replicates | Model | Sample size (n) | |||||
25 | 50 | 100 | 250 | 500 | 1000 | ||
5,000 | Model A | 15.40 | 24.28 | 43.12 | 78.68 | 96.94 | 99.98 |
Model B | 6.88 | 9.46 | 15.98 | 32.24 | 58.92 | 90.08 |
Note: Values of α?_{s} = 100 × T_{s}_{·}/m s (s = A, B), where T_{s}_{·} is the number of times the tests rejected H_{0} out of m replicates for different sample sizes with complete data. Tests are based on Wald chi-square and are run at α = 0.05; A and B denote models due to Nadarajah et al. (2011) and Zakerzadeh & Dolati (2009), respectively.
Size of tests on parameter(s) based on generalized Lindley distributions using complete data
Model A H_{0} : φ = 1; Model B H_{0} : ψ = γ = 1 | |||||||
---|---|---|---|---|---|---|---|
# Replicates | Model | Sample Size (n) | |||||
25 | 50 | 100 | 250 | 500 | 1000 | ||
5000 | Model A | 3.90 | 4.20 | 4.52 | 4.66 | 4.80 | 4.74 |
Model B | 7.12 | 11.24 | 16.04 | 16.70 | 12.34 | 7.18 |
Note: Values of α?_{s} = 100 × T_{s}_{·}/m s (s = A, B), where T_{s}_{·} is the number of times the tests rejected H_{0} out of m replicates for different sample sizes with complete data. Tests are based on Wald chi-square and are run at α = 0.05; A and B denote models due to Nadarajah et al. (2011) and Zakerzadeh & Dolati (2009), respectively.
Size and power of tests on parameter(s) based on a generalized Lindley distribution due to Nadarajah et al. (2011) using 20% censored data
H_{0}_{C} : θ = θ_{0}; H_{0}_{D} : θ = θ_{0} + Δ | |||||||
---|---|---|---|---|---|---|---|
# Replicates | Hypotheses | Sample size (n) | |||||
25 | 50 | 100 | 250 | 500 | 1000 | ||
5000 | C | 5.00 | 4.94 | 4.44 | 4.92 | 5.12 | 5.16 |
D | 15.40 | 24.28 | 43.12 | 78.68 | 96.94 | 100.00 |
Note: Values of α?_{s} = 100 × T_{s}_{·}/m, where T_{s}_{·} is the number of times the tests rejected H_{0} out of m replicates for different sample sizes. Tests are based on Wald chi-square and are run at α = 0.05. C and D represent size and power, respectively.