TEXT SIZE

search for



CrossRef (0)
A Comparison of Size and Power of Tests of Hypotheses on Parameters Based on Two Generalized Lindley Distributions
Commun. Stat. Appl. Methods 2015;22:233-239
Published online May 31, 2015
© 2015 Korean Statistical Society.

Macaulay Okwuokenye1,a, and Karl E. Peaceb

aBiogen Idec, USA, bJiann-Ping Hsu College of Public Health, Georgia Southern University, USA
Correspondence to: Macaulay Okwuokenye
Senior Biostatistician, Biogen Idec, MA. E-mail: Macaulay.Okwuokenye@biogenidec.com
Received December 30, 2015; Revised February 2, 2015; Accepted February 2, 2015.
 Abstract

This study compares two generalized Lindley distributions and assesses consistency between theoretical and analytical results. Data (complete and censored) assumed to follow the Lindley distribution are generated and analyzed using two generalized Lindley distributions, and maximum likelihood estimates of parameters from the generalized distributions are obtained. Size and power of tests of hypotheses on the parameters are assessed drawing on asymptotic properties of the maximum likelihood estimates. Results suggest that whereas size of some of the tests of hypotheses based on the considered generalized distributions are essentially -level, some are possibly not; power of tests of hypotheses on the Lindley distribution parameter from the two distributions differs.

Keywords : Lindley distribution, size and power of tests of hypotheses on time-to-event parameters, generalized Lindley distribution, observed and censored times, survival function, death density function
1. Introduction

The Lindley distribution (Lindley, 1958) was introduced in connection with fudicial distribution and Bayes theorem, but it has not seen much utility in the analyses of time-to-event data until the past decade. Its mathematical properties and real data applications were recently explored by Ghitany et al. (2008). Since then, in attempts to enhance its flexibility, many authors have considered some generalizations of the Lindley distribution (see for example, Zakerzadeh and Dolati (2009) and Nadarajah et al. (2011)).

Generalization entails the addition of parameter(s) to the Lindley distribution and specification of conditions under which the generalized distribution reduces to the Lindley distribution. Theoretical results suggest that, under certain conditions, a generalized Lindley distribution reduces to the Lindley distribution with the same scale parameter. But these results provide no practical guideline on how the generalization affects the parameter estimate of the resulting Lindley distribution. Additionally, the results provide no possible implications of generalization on size and power of tests on parameter(s).

Generalized distributions have more parameters than their parent distributions. Distributions with more parameters, however, have fewer residual degrees of freedom than those with fewer parameters. Such models (models are used interchangebly with distributions in this study) are often compared using approaches, such as, the Akaike information criteria (AIC; Akaike, 1974) and Bayesian information criteria (BIC; Schwarz, 1978), that take this estimation penalty into consideration. The thrust behind the use of these comparison approaches is to compare non-nested models based on their log-likelihood but with penalty for the number of parameters in the respective models. In some settings, selecting best model based on AIC and BIC could be challenging since they are not always in agreement (Rizopoulos, 2012).

Tests of hypotheses on parameters are commonly accomplished through methods that draw on either likelihood ratio theory or asymptotic properties of parameter estimates. It may worth the effort to investigate how generalization of the Lindley model possibly affects the estimate of the parameter. Because different approaches to generalization may have their own associated uncertainty (enthropy) and variability. Results from this investigation provided us with a tool for evaluating two generalized Lindley distributions and for assessing consistency between their theoretical and analytical results.

This study is motivated by the need to select between the presented generalized Lindley models to fit a data set. To the best of our knowledge, we are not aware of any study that assessed the performance of the presented generalized Lindley models based on the property of their parameter estimates, nor are we aware of any study that considered these models in regard to censored observations.

In this article, using a Monte Carlo simulation technique, we (1) assess size and power of tests of hypotheses on parameters when modeling Lindley-distributed data using two generalized Lindley distributions with censored and non-censored data; and (2) contrast results of size and power of tests from the two generalized Lindley distributions.

2. The Lindley Model and Two of Its Generalizations

2.1. The Lindley model

Denote by the scale parameter; the Lindley probability density function (pdf) is

f(x;)=21+(1+x)?exp(-x),?????????x,>0.

The corresponding cumulative distribution and survival functions are

F(x;)=1-(1++x)1+exp(-x),?????????x,>0,

and

S(x;)=(1++x)1+exp(-x),?????????x,>0,

respectively. Suppose X1, X2, , Xn is a random sample from the Lindley distribution, the estimator for both maximum likelihood estimate (MLE) and method of moments estimate of the parameter is

^=-(X?-1)+(X?-1)2+8X?2X?,?????????X?>0.

See Ghitany et al. (2008) for detailed mathematical characteristics of the Lindley distribution.

2.2. Generalized Lindley models

In attempts to enhance flexibility of the Lindley distribution, several authors considered different generalizations of the Lindley model. Zakerzadeh and Dolati (2009) considered the pdf

f(x;,,)=2(x)-1(+x)?exp(-x)(+)(+1),?????????,,x>0.

For = = 1, Equation (2.2) reduces to the Lindley pdf in Equation (2.1). Besides not providing the distribution function for Equation (2.2), the authors note that the survival function of Equation (2.2) may only be written in form of incomplete gamma function when is an integer; the hazard rate function h(x; , , ) has no closed form. Nonetheless, Equation (2.2) has increasing hazard rate function for 돟 1; bathtub shaped hazard for < 1 and > 0, and decreasing hazard for 돞 1 and = 0. See Zakerzadeh and Dolati (2009) for detailed mathematical properties of Equation (2.2). The authors argued that Equation (2.2) is the pdf resulting from a mixture of two random variables distributed as gamma(, ) and gamma( + 1, ).

Nadarajah et al. (2011) considered the pdf

f(x;,)=21+(1+x)?[1-(1++x)1+exp(-x)]-1?exp(-x),?????????x,,>0.

The corresponding distribution and survival functions of Equation (2.3) are

F(x;,)=[1-(1++x)1+exp(-x)],?????????x,,>0,

and

S(x;,)=1-[1-(1++x)1+exp(-x)],?????????x,,>0,

respectively. For = 1, Equation (2.3) reduces to the Lindley pdf in Equation (2.1). Equation (2.3) accommodates monotonic increasing and decreasing and bathtub shaped hazard rate function (Nadarajah et al., 2011). See Nadarajah et al. (2011) for detail mathematical properties of Equation (2.3).

3. Test of Hypotheses

Let the observed and censored event times be ti (i = 1, 2, , d) and Tk (k = 1, 2, , c = n ? d), respectively. The log-likelihood for such a sample is

ln?L=i=1dln?f(x;,)+k=d+1Nln?S(x;,),

where f(x; , ) = h(x; , )S (x; , ) denote the density function, and S (x; , ) the survival function. The MLEs of the parameters may be found iteratively. The asymptotic covariance matrix of the estimators is approximately

-[2?ln?L]tt-1,

where = 1, 2 , p, and t = p + 1.

For density in Equation (2.2) = , ; for density in Equation (2.3) = . Let = (, ). The tests of hypotheses considered in this study are

H0:=0,

and

H0:=0.

Let V?(?) be the estimated covariance matrix of ?. The statistics, based on asymptotic properties of ?, for testing the H0 in Equation (3.1) is

T=(L^-0)?[LV^(^)L]-1?(L^-0),

where L is a matrix of known coefficients. Under H0, T is asymptotically distributed as chi-square with r degrees of freedom, written T~(r)2; r represents the rank of L. For the hypothesis in Equation (3.2), the test statistics in Equation (3.3) reduces to

T0=[^-0]2?[V^(^)]-1,?????????T0~(1)2.

The data sets used in assessing power are same as those used in assessing size. However, in assessing power, we produced a new hypotheses by adding 20% to the parameter values used in generating the data sets. This is because in the application setting that warranted present study, the aim is to ensure that any of the generalized model selected should be able to detect at least a 20% change in Lindley parameter value to which the generalized models reduce to while maintaining false positive rates within nominal 5% value, as we considered a change of at least 20% a meaningful minimum detectable difference.

Failure to reject the hypothesis in Equation (3.1) implies that the generalized distribution under consideration reduces to the Lindley distribution. Similarly, failure to reject the hypothesis in Equation (3.2) implies that generalization has no significant effect on the parameter of the Lindley distribution.

3.1. Generating Lindley times-to-event data

Data are simulated under the assumption that they follow the Lindley distribution with parameter , using the method described by Peace (1976), Peace and Flora (1978), Bender et al. (2005), Qian et al. (2010) and Okwuokenye (2012). Briefly, suppose Ui (i = 1, 2 , n) are random numbers, times-to-event (ti) may be obtained by solving the following equation of the cumulative hazard H(ti) for ti:

H(ti)=0tih0(w)?dw=-[ln?(1-ui)],

where h0 is the hazard rate function, and Ui is a uniform random variable. The ti for the Lindley model are obtained by solving for ti in the non-linear equation that results from replacing h0 in Equation (3.4) with the Lindley hazard rate function, given by

h(t;)=2(1+t)+1+t,?????????>0,t0.

Times-to-event are generated by specifying the Lindley parameter ( = 1.5) to mirror the data set to be analyzed by the authors. It is assumed that a sample of N independent, observable survival times (t1, t2, , tN) are available on a population; of those, d (dN) are the observed times of the event, and the remaining k = N?d are censored at the end of the study. The distribution of the 20% censored data are assumed to follow the Lindley distribution.

In assessing size and power of tests of hypotheses, m = 5,000 independent data sets are generated for each sample size (n). Quasi-Newton method, with 10?8 convergence criteria, is used to obtain the maximum likelihood estimates prior to assessing the size and power of tests. Simulations and analyses are performed using SAS/STAT? software version 9.3 of SAS system for Windows. Copyright ? 2011 SAS Institute Inc. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc., Cary, NC, USA.

4. Results and Discusions

4.1. Results

Noted in Section 3 are the parameter values for assessing size of tests based on the two generalized Lindley distributions. The value for assessing size of tests is the parameter value used in generating tis that followed the Lindley distribution. The value used in assessing power of test is 1.8, reflecting a 20% addition to the original Lindley parameter value. The proportion 慣?s = Ts/m (s = A, B) denotes the estimated size of test or false positive rates, where Ts represents the number of times the tests rejected H0 out of m replicates (Tests from model in Equation (2.3) and (2.2) are indexed by A and B, respectively).

Tables 1?3 show the size and power of tests of hypotheses in percent (i.e., 100횞慣?s) for the tests of hypotheses described in Section 3, for non-censored data. Table 4 shows the results of size and power of tests based on the considered models for data containing 20% censoring.

4.2. Discussion

This study assesses size and power of tests of hypotheses on parameters based two generalized Lindley distributions, arising from the maximum likelihood estimates of those parameters. For complete data, tests based on the two distributions, for the hypotheses H0 : = 0 are -level tests, as are those based on Equation (2.3) for the hypotheses H0 : = 1. The tests based on Equation (2.2) for the hypotheses H0 : = = 1 are essentially ? level tests with the possible exception of small samples. The size of the Wald chi-square tests of the null hypotheses for parameters from Equation (2.2) ( = = 1) are at least 11.5% for 50 < n 돞 500. The hypotheses that any of the above cases is an -level test may be formulated as H0 : = 0.05, and the test statistics formulated as

Z=^-0.05^(1-^)/m,

where Z ~ N(0, 1). For n = 500 and m = 5, 000, the estimated size is 12.34%; accordingly, z = 15.78 (p < 0.0001).

For power of tests of the hypotheses H0 : = 0, Equation (2.3) seems to consistently have higher power than Equation (2.2) for n 돞 1000. The power of tests of the former is approximately 80% for n = 250 and 100% for n = 1000.

Of note is that convergence problems were encountered for some replicates when fitting model due to Zakerzadeh and Dolati (2009) to the generated Lindley-distributed data sets. For sample sizes of 25 and 50, 4,998 out of 5,000 replicates converged. Accordingly, the reported estimate of size and power of tests for the aforementioned cases were based on the above number of replicates in which optimization converged.

For data with 20% censored observations, the assessment of size and power of tests based on the model in Equation (2.2) for the hypothesis H0 : = 0 was not done due to the unavailability of the survival function in closed form.

For model in Equation (2.3), size of tests for the hypothesis H0 : = 0 are essentially -level tests when Lindley-distributed data contain 20% censored observations; the model also have sufficient power to detect a 20% difference in the Lindley parameter value for n 돟 250 (Table 4).

5. Conclusion

Results from this study suggest that size of tests of hypotheses on parameter(s) are -level for the generalized Lindley model in Equation (2.3), but not so for the generalized model of Equation (2.2) for small samples when modeling non-censored, Lindley-distributed data. The former seems to have higher power than the latter for small sample sizes in the cases investigated. For data containing 20% censored observations, generalized model in Equation (2.3) gives ?level tests and has adequate power when n 돟 250.

TABLES

Table 1

Size of tests on parameter(s) based on generalized Lindley distributions using complete data

H0 : = 0

# ReplicatesModelSample size (n)

25501002505001000
5,000Model A5.325.224.685.105.285.10

Model B1.902.081.802.663.503.50

Note: Values of 慣?s = 100 횞 Ts/m s (s = A, B), where Ts is the number of times the tests rejected H0 out of m replicates for different sample sizes with complete data. Tests are based on Wald chi-square and are run at = 0.05; A and B denote models due to Nadarajah et al. (2011) and Zakerzadeh & Dolati (2009), respectively.


Table 2

Power of tests on parameter(s) based on generalized Lindley distributions using complete data

H0 : = 0 +

# ReplicatesModelSample size (n)

25501002505001000
5,000Model A15.4024.2843.1278.6896.9499.98

Model B6.889.4615.9832.2458.9290.08

Note: Values of 慣?s = 100 횞 Ts/m s (s = A, B), where Ts is the number of times the tests rejected H0 out of m replicates for different sample sizes with complete data. Tests are based on Wald chi-square and are run at = 0.05; A and B denote models due to Nadarajah et al. (2011) and Zakerzadeh & Dolati (2009), respectively.


Table 3

Size of tests on parameter(s) based on generalized Lindley distributions using complete data

Model A H0 : = 1; Model B H0 : = = 1

# ReplicatesModelSample Size (n)

25501002505001000
5000Model A3.904.204.524.664.804.74

Model B7.1211.2416.0416.7012.347.18

Note: Values of 慣?s = 100 횞 Ts/m s (s = A, B), where Ts is the number of times the tests rejected H0 out of m replicates for different sample sizes with complete data. Tests are based on Wald chi-square and are run at = 0.05; A and B denote models due to Nadarajah et al. (2011) and Zakerzadeh & Dolati (2009), respectively.


Table 4

Size and power of tests on parameter(s) based on a generalized Lindley distribution due to Nadarajah et al. (2011) using 20% censored data

H0C : = 0; H0D : = 0 +

# ReplicatesHypothesesSample size (n)

25501002505001000
5000C5.004.944.444.925.125.16

D15.4024.2843.1278.6896.94100.00

Note: Values of 慣?s = 100 횞 Ts/m, where Ts is the number of times the tests rejected H0 out of m replicates for different sample sizes. Tests are based on Wald chi-square and are run at = 0.05. C and D represent size and power, respectively.


References
  1. Akaike, H (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control. 19, 716-723.
    CrossRef
  2. Bender, R, Augustin, M, and Blettner, M (2005). Generating survival times to simulate Cox셲 proportional hazards models. Statistics in Medicine. 24, 1713-1723.
    CrossRef
  3. Ghitany, ME, Atieh, B, and Nadarajah, S (2008). Lindley distribution and its application. Mathematics and Computers in Simulation. 78, 493-506.
    CrossRef
  4. Lindley, DV (1958). Fiducial distributions and Bayes theorem. Journal of the Royal Statistical Society Series B. 20, 102-107.
  5. Nadarajah, S, Bakouch, HS, and Tahmasbi, R (2011). A generalized Lindley distribution. Sankhya B. 73, 331-359.
    CrossRef
  6. Okwuokenye, M 2012. Size and power of tests of hypotheses on parameters when modelling time-to-event data with the Lindley distribution. Doctoral dissertation. Georgia Southern University. Statesboro, GA.
  7. Peace, KE 1976. Maximum likelihood estimation and efficiency assessments of tests of hypotheses on survival parameters. Doctoral dissertation. Medical College of Virginia, Virginia Commonwealth University. Richmond, VA.
  8. Peace, KE, and Flora, RE (1978). Size and power assessment of tests of hypotheses on survival parameters. Journal of the American Statistical Association. 73, 129-132.
    CrossRef
  9. Qian, J, Li, J, and Chen, P 2010. Generating survival data in the simulation studies of Cox model., Proceedings of 2010 Third International Conference on Information and Computing (ICIC), Wuxi, China, pp.93-96.
  10. Rizopoulos, D (2012). Joint Models for Longitudinal and Time-to-Event Data: With Applications in R. Boca Raton: CRC Press.
    CrossRef
  11. Schwarz, G (1978). Estimating the dimension of a model. Annals of Statistics. 6, 461-464.
    CrossRef
  12. Zakerzadeh, H, and Dolati, A (2009). Generalized Lindley distribution. Journal of Mathematical Extension. 3, 13-25.