search for

CrossRef (0)
Tests of equality of several variances with the likelihood ratio principle
Communications for Statistical Applications and Methods 2018;25:329-339
Published online July 31, 2018
© 2018 Korean Statistical Society.

Hyo-Il Park

aDepartment of Statistics, Cheongju University, Korea
Correspondence to: 1 Department of Statistics, Cheongju University, 298 Dae-Sung Ro, Cheongwon-gu, Cheongju-si, Chungcheongbuk-do, 28503, Korea. E-mail: hipark@cju.ac.kr
Received January 10, 2018; Revised March 21, 2018; Accepted May 8, 2018.

In this study, we propose tests for equality of several variances with the normality assumption. First of all, we propose the likelihood ratio test by applying the permutation principle. Then by using the p-values for the pairwise tests between variances and combination functions, we propose combination tests. We apply the permutation principle to obtain the overall p-values. Also we review the well-known test statistics for the completion of our discussion and modify a statistic with the p-values. Then we illustrate proposed tests by numerical and simulated data and compare their efficiency with the reviewed ones through a simulation study by obtaining empirical p-values. Finally, we discuss some interesting features related to the resampling methods and tests for equality among several variances.

Keywords : combination function, likelihood ratio test, normal distribution, permutation principle
1. Introduction

In statistics, the mean or location parameter has been of main interest for the statistical inference regardless of the parametric or nonparametric approach. The variance or scale parameter has always been treated as a nuisance parameter, but appears to have unnegligible roles for the results of data analysis, since sometimes one tries to check if the assumptions for the equality of variances is validated or not. For example, when one analyzes data with two-sample t-test, one assumes that the two variances are equal even though the value is not known. Then many procedures of statistical software provide outputs which tell us that the assumption between two variances are appropriate or not. Also tests for equality between two variances has been fully developed and widely applied in both parametric and nonparametric approach. For the comparison procedure among means with three or more samples, one may find some results about equality of variances in the ANOVA procedure in SAS.

As a matter of fact, the inferences about variance or variances have been scarcely reported in comparison with those about mean or means. One reason for this phenomenon may come from the fact that it would be difficult to construct any suitable form of statistic for testing equality among more than two variances or interpret the form of the likelihood ratio (LR) function and/or derive the null distribution of corresponding the LR function even under the normality assumption. There does not seem to exist either the LR procedure or an asymptotic procedure related to the LR approach in the literature except the Bartlett test (Bartlett, 1937). Only several heuristic ad hoc or modified procedures have been published and used by practitioners. The Bartlett test (Bartlett, 1937) appears first in the literature to test equality of more than two variances. The statistic resembles a function of the LR statistic to be able to obtain the null distribution even though an asymptotic result by modifying with some quantities related to the sample sizes. Hartley (1950) proposed a procedure based on the quotient of the maximum and minimum of individual sample variances. Hartley’s test is easy to perform but is sensitive to departures from normality (Hand and Nagaraja, 2003). Then Levene (1960) considered a procedure with the data transformation by subtracting the sample mean and then taking absolute values for each observation. Also Brown and Forsythe (1974) modified the Levene’s test by using sample medians instead of the sample means. Since then several modified tests have been proposed. All the null distributions for tests reviewed up to are asymptotic ones. Also O’Brien (1979, 1981) further modified the Levene’s statistic. The main idea behind the O’Brien test is to transform the original scores so that the transformed scores reflect the variation of the original scores. Recently, Gokpinar and Gokpinar (2017), Chang et al. (2017) and Jayalath et al. (2017) considered some improvements of the tests reviewed in the above in terms of power and achieving the significance level. Especially, Gokpinar and Gokpinar (2017) and Jayalath et al. (2017) have tried to apply the bootstrap method.

The LR test requires exact specification or assumption of the population distributions. Especially, for the LR tests for the means and variances, one almost always assumes the normality and can use the well-known results for the null distributions of the corresponding LR statistics. Or when it would be difficult to derive the exact null distributions theoretically, it is common to obtain the limiting or asymptotic distributions based on the LR arguments. Sometimes even the derivation of an asymptotic distribution would not be possible or the serious discrepancy of an asymptotic distribution from the unknown distribution of the LR statistic might be detected from a simulation study. Then one may apply the Monte-Carlo method (Park, 2018).

With high development of computer capacity and its softwares, the distributions for test statistics have heavily been dependent on the resampling methods such as the bootstrap and permutation methods. Only the difference between both can be summarized as with replacement and without replacement when one resamples from the original data set. However it has been known that the results for both methods may be quite different (Good, 2000). If both methods can be applied for testing, the use of the permutation principle has been recommended (Good, 2000; Pesarin, 2001) since the permutation principle estimates the unknown distribution while the bootstrap method does the parameter. It is usual that the permutation principle may be applied with the Monte-Carlo approach. For the optimal number of iterations when the permutation principle can be applied, Oden (1991) and Boos and Zhang (2000) have studied some. For more discussion for the resampling methods, you may refer to Westfall and Young (1993) and Good (2000).

In this paper, we propose test procedures for testing equality of several variances simultaneously. For this purpose, the rest of this paper will be organized as follows. In the Section 2, first of all, we derive the LR function under the normality assumption. Then we propose the LR test which is intuitively easy to use and requires the minimal computations by applying the permutation principle to obtain the p-value. Also by examining the LR statistic in detail, we observe that the LR statistic consists of LR statistics for testing equality between two variances. Based on this fact, we consider using the combination functions to combine individual tests for equality between two variances. We also apply the permutation principle to obtain the overall p-values of the proposed combination tests. In the Section 3, we review the well-known test statistics briefly and modify a statistic with the individual p-values instead of the sample variances in the spirit of our proposed test to alleviate the discrepancy to departure of normality. Then we illustrate our procedure with numerical and simulated data and compare the efficiency of our proposed tests with other well-known ones in the Section 4. Finally we discuss some interesting features related with the test of equality of variances and re-sampling methods and state the future research topics briefly in the Section 5.

2. Tests of equality of several variances

Suppose that we have K independent samples Xi1, …, Xini from populations with distributions, N(μi,σi2), i = 1, …, K. Then it is of our main interest to test


against H1 : not H0. In order to derive the LR test, first of all, we introduce notation in the following. For each i, i = 1, …, K

X¯i=1nij=1niXij         and         Si2=1nij=1ni(Xij-X¯i)2.

We note that all the i’s and Si2’s are the maximum likelihood estimates of μi’s and σi2’s, respectively under H0H1 but Si2’s are not unbiased. Also we have that


where n=i=1Kni. We note that Sp2 is the maximum likelihood estimate of σ2 under H0 but not unbiased, too. Then the LR function, LR(σ12,,σK2;X) is

LR (σ12,,σK2;X)=sup {i=1Kj=1nif(Xij;σi2H0H1)}sup {i=1Kj=1nif(Xij;σ2H0)}=(Sp2)n2i=1K(Si2)ni2=i=1K(Sp2Si2)ni2,

where f is the probability density function for the normal. Then one may reject H0 in favor of H1 for some large values of LR(σ12,,σK2;X). In order to complete the LR test, we need the null distribution of LR(σ12,,σK2;X) but it would be difficult to derive the null distribution since the numerator and denominator in (2.1) are not independent. Then one may detour this difficulty via the limiting distribution approach for 2log LR(σ12,,σK2;X), where log means the natural logarithm and will be used with the same context in the sequel. However we have found that the discrepancy of the limiting distribution of 2log LR(σ12,,σK2;X) from the unknown distribution was serious from a preliminary simulation study since the nominal significance level could not be achieved even for the normal distribution. For this reason, we have considered using the permutation method to obtain the null distribution G, say, of LR(σ12,,σK2;X) or 2log LR(σ12,,σK2;X). Then we note that the LR test would be intuitively apparent and easy to use owing to the permutation distribution. Before we proceed to propose more test procedures, we briefly indicate the use of the permutation principle with the Monte-Carlo approach to be applied to the multiple samples in the following. You may refer to Westfall and Young (1993) for more explanations and applications.

  • Combine the K samples into a one sample, X1, …, Xn, say.

  • With a suitable random configuration scheme, rearrange X1, …, Xn as X1p,,Xnp, say, with XipXjp whenever 1 ≤ ijn.

  • Allocate X1p,,Xn1p as the first sample and do Xn1+1p,,Xn1+n2p as the second sample and keep on up to XnK-1+1p,,Xnp as the Kth sample.

  • Calculate the value of LR(σ12,,σK2;Xp) or 2log LR(σ12,,σK2;Xp).

  • Iterate M times from (II) to (IV) and obtain empirical distribution function ĜM of LR(σ12,,σK2;X) or 2log LR(σ12,,σK2;X) from the M number of LR(σ12,,σK2;Xp)s or 2log LR(σ12,,σK2;Xp)s.

Then we may complete the LR test by obtaining p-value using the steps for the permutation principle stated above. On the other hand, we note that the LR function (2.1) has the form of the product of the following terms: for each i, i = 1, …, K


In other words, the LR function (2.1) consists of all the components such as (niSi2)/(njSj2) and (njSj2)/(niSi2) for testing H0ij:σi2=σj2 against H1ij:σi2σj2 for 1 ≤ ijK. This fact makes us to consider another type of test procedure for testing H0:σ12==σK2=σ2 via applying the combination functions. Then there are K(K − 1)/2 numbers of sub null hypotheses for testing H0ij:σi2=σj2 using (niSi2)/(njSj2) or (njSj2)/(niSi2). This implies that one may reject H0ij:σi2=σj2 in favor of H1ij:σi2σj2 for some large or small values of (niSi2)/(njSj2) or (njSj2)/(niSi2). However we note that (niSi2)/(njSj2) and (njSj2)/(niSi2) are reciprocal forms for each other. Also we note that all the (niSi2)/(njSj2)s do not have the same distribution since their distributions have different dfs even though they all are members of the F-distribution family. Therefore one cannot combine them directly. One approach to avoid this difficulty is to use the individual p-values of the individual test, H0ij:σi2=σj2. For doing this, first of all, we need following well-known result. Without loss of generality, from now, we will assume that 1 ≤ i < jK.

Lemma 1

UnderH0:σ12==σK2=σ2, for each i and j, 1 ≤ i < jK,


distributed as F with ni − 1 and nj − 1 dfs.

Let Λi j be the p-value for testing H0ij:σi2=σj2 based on Fi j. Then we combine K(K −1)/2 numbers of p-values by choosing a suitable combination function. The famous and widely used combination functions are as follows. Pesarin (2001) presented a concise review and investigated extensively the properties of the combination functions.

  • Fisher combination (FC) function FC=-21i<jKlog(1-Λij).

  • Liptak combination (LC) function LC=1i<jKΨ-1(Λij),

    where Ψ is a distribution function and Ψ−1, its inverse.

  • Tippett combination (TC) function TC=min {Λij;1i<jK}.

Then the testing rule would be to reject H0:σ12==σK2=σ2 for some small values of each combination function. To complete the test H0:σ12==σK2=σ2, we have to derive the null distributions of the combination functions. However since the components are not independent, it would be difficult to obtain the null distributions in theoretic manner. Therefore we will use the permutation principle to obtain the null distributions for the combination functions.

Then we may carry out to test H0:σ12==σK2=σ2 by choosing a suitable test from the proposed ones and applying the permutation principle using the steps stated previously for estimating the null distributions of the combination functions. In relation to our proposal of test procedures for testing H0:σ12==σK2=σ2, we briefly review some test statistics which are famous and widely used to compare the structure of test statistics with the proposed ones. In the next section, we will deal with this subject in some detailed fashion.

3. Other test statistics

In this section, we review the well-known test statistics and modify one of them using Λi j’s which are p-values for testing H0ij:σi2=σj2 for ij introduced in the previous section. First of all, we note that Bartlett (1937) proposed a test basically based on the LR function (2.1). By abusing the usage of notation, we assume that Si2’s and Sp2 are unbiased estimates of σi2’s and σ2, under H0H1 and H0, respectively. Then the Bartlett statistic BA can be defined as

BA=(n-K)log (Sp2)-i=1K(ni-1)log (Si2)1+13(K-1)(i=1K1ni-1-1n-K)=i=1K(ni-1)log (Sp2Si2)1+13(K-1)(i=1K1ni-1-1n-K).

Then we note that one may obtain the numerator of BA by taking logarithm, multiplying 2 and subtracting 1 from ni for each i for (2.1). The purpose of a modification of 2log LR(σ12,,σK2;X) into BA was to make the approximation to the chi-square distribution better (Bartlett, 1937). This in turn implies that the limiting distribution of 2log LR(σ12,,σK2;X) cannot be a chi-square with K − 1 df. In relation with the sample variances, Hartley (1950) also proposed a test using the following statistic, HA

HA=max {Si2;1iK}min {Si2;1iK}

the ratio of the largest sample variance relative to the smallest sample variance. Then the testing rule would be reject H0 for some large values of HA. Pearson and Hartley (1970) have provided critical values of HA for some selected sample sizes and significance levels. It is also known to be very sensitive to departures from the normality assumption. However we note that HA resembles the Tippett combination function. For this reason, we may modify the statistic HA using Λi j such as

THA=min {Λij;1i<jK}max {Λij;1i<jK}.

Then the testing rule would be to reject H0 for some small values of THA. The null distribution of THA can be obtained by the permutation principle. We will use THA instead of HA when we consider to show examples and carry out simulation study. In order to avoid the sensitivity to departures from normality assumption, Levene (1960) proposed a test using the following statistic,


where Zi j = |Xi ji|. Also i and are the means of Zi1, …, Zini and all the Zi j’s respectively. Then the testing rule would be to reject H0 for some large values of and the limiting distribution of W is known as F-distribution with K − 1 and nK df. The Levene test has been popular and widely used with various modification forms. Especially Brown and Forsythe (1974) modified the Levene statistic by using median instead of the mean in computing the spread within each group. Brown and Forsythe (1974) performed several simulation studies by varying such as trimmed mean and median.

Then it would be interesting to compare the efficiency among those proposed and reviewed tests for testing H0:σ12==σK2=σ2 in the next section. For the Hartley’s test, we will use THA rather than HA since it is known that the test based on HA is sensitive to departure from normality. Then we will begin the next section by illustrating our proposed tests with two examples which are real and simulated data.

4. Examples and simulation results

In this section, we begin with the illustration of our test procedure with a numerical example for absorption in various amounts of fat during cooking doughnuts reported from the Iowa Agricultural Experiment Station (Lowe, 1935) summarized in Table 1. For each of four fats, six batches of doughnuts were prepared. The data in Table 1 are the grams of fat absorbed per batch. Initially, Snedecor and Cochran (1989) performed to test H0:σ12==σ42=σ2 against H1 : not H0, by applying the Bartlett test and obtained insignificant result. We consider 7 different tests proposed and reviewed ones up to now. Especially for the Hartley test, we use the THA rather than the HA for the test statistic in order to sensitivity to departure of normality. The distributions used for the Bartlett and Levene tests are chi-square with 3 df and F with 3 and 20 df, respectively. For the rest, we applied the permutation principle with the Monte-Carlo approach to obtain the p-values with 10,000 repetitions on SAS/IML with PC-version. The respective p-values for this example are summarized in Table 2 and show the insignificance for equality among variances.

Snedecor and Cochran (1989) also considered simulated data tabulated in Table 3 for testing H0:σ12==σ42=σ2 against H1 : not H0 in order to observe initially the behaviors of the Bartlett and Levene tests when the assumption of normality is violated. In Table 3, four independent samples with ni = 7 for all i were drawn from the T distribution with 3 df (a symmetrical long-tailed distribution) with the number 7 added. Snedecor and Cochran (1989) applied both Bartlett and Levene tests and observed that the Bartlett test yielded significance result which is absurd since all the data were drawn from the same distribution. Also we considered 7 tests used in the previous example and obtained p-values which are summarized in Table 4. We note that except the Bartlett test, all the other ones show insignificance in terms of p-values. Also we will note once again this absurdity in the Bartlett test from the sequel simulation study

We compare efficiency among 7 tests considered up to through simulation study by obtaining empirical p-values. For this, we considered the three-sample case and four different distributions such as normal, Laplace, logistic and uniform, which are all symmetric. We have drawn random numbers with unit variance for all cases and varied values of the standard deviation from 1 to 2 with increment 0.2 only for the third sample. We considered two cases for the sample sizes such as n1 = n2 = n3 = 15 and n1 = 15, n2 = 20, and n3 = 25. We have carried out 10,000 simulations and applied the permutation principle to obtain the distribution with 2,000 repetitions within a simulation for each case. The nominal significance level has been chosen as 0.05 for all case. All the computations were carried out with SAS/IML with PC-version. The obtained empirical p-values are summarized in Tables 58. First of all, we note that except normal distribution, the Bartlett test hardly have achieved its nominal significance level whose phenomenon have already been pointed out by many statisticians. Among other tests, the LR test performs the best in terms of the empirical powers and is convenient to use since it only requires the computation of the LR function (2.1) or 2log LR(σ12,,σK2;X) without any further consideration. Only the Tippett and Hartley tests may compete with the LR test in terms of powers but require the three individual tests for H0ij:σi2=σj2, 1 ≤ i < j ≤ 3. As one may expect, the Tippett and Hartley with THA yielded very similar results for all distributions and sample sizes. Also the LR test protects the departure of normality well. In general, we note that the Fisher test showed relatively low performance.

5. Some concluding remarks

In general, the research in variance or scale parameters have been retarded compared with that of mean or location parameters. The reason for this may come from the deficiency of demands in the application aspect. However it cannot be denied either that the distributions of the LR functions and related statistics have not been fully developed until now. This phenomenon already has been confirmed by Park (2018) for the study of testing procedure for the covariance matrix even for the one-sample problem. If we confess the difficulty for the derivation of the LR statistic, we have found during the preliminary simulation study that the limiting distribution of 2log LR(σ12,,σK2;X) did not work properly but just fail simply since the size of test could hardly been achieved for any given nominal significance level. This should have been found by Bartlett (1937) since he proposed the modified form, BA of 2log LR(σ12,,σK2;X) to be adjusted for the chi-square distribution with K−1 df, which was successful for normal distribution but not for the others. Even though our approach cannot be called as a ground-breaking method, one should consider seriously adopting an alternative approach. Therefore our results may be far from completion but can be a bridge to cross the unfinished battle ground for the tests of variance and/or scale parameters.

Nowadays the simultaneous test procedures for the mean and variance or location and scale parameters have appeared in the media of statistical journals frequently (Park, 2015, 2017). However the lack of the results for the variance and scale parameters could be obstacles for this research. Therefore one may take a privilege from this result to further ones research for simultaneous tests or the tests among variances.


Table 1

Doughnut fat absorption data

Amount of fat absorbed

Fat 1Fat 2Fat 3Fat 4

Table 2

p-values for doughnut data


LR = likelihood ratio; FC = Fisher combination; LC = Liptak combination; TC = Tippett combination; THA = Tippett and Hartley test; BA = Bartlett statistic; W = Levene test statistic.

Table 3

Simulated data

Data for class


Table 4

p-values for simulated data


LR = likelihood ratio; FC = Fisher combination; LC = Liptak combination; TC = Tippett combination; THA = Tippett and Hartley test; BA = Bartlett statistic; W = Levene test statistic.

Table 5

Empirical p-values for normal distribution

Test(n1, n2, n3)(σ1, σ2, σ3)

(1, 1, 1)(1, 1, 1.2)(1, 1, 1.4)(1, 1, 1.6)(1, 1, 1.8)(1, 1, 2)
TC(15, 15, 15)0.04700.10840.25830.44990.62700.7575

TC(15, 20, 25)0.04820.11290.31380.55710.74750.8660

LR = likelihood ratio; FC = Fisher combination; LC = Liptak combination; TC = Tippett combination; THA = Tippett and Hartley test; BA = Bartlett statistic; W = Levene test statistic.

Table 6

Empirical p-values for Laplace distribution

Test(n1, n2, n3)(σ1, σ2, σ3)

(1, 1, 1)(1, 1, 1.2)(1, 1, 1.4)(1, 1, 1.6)(1, 1, 1.8)(1, 1, 2)
TC(15, 15, 15)0.04780.08740.17640.29100.41400.5251

TC(15, 20, 25)0.05020.08410.19270.34530.49310.6192

LR = likelihood ratio; FC = Fisher combination; LC = Liptak combination; TC = Tippett combination; THA = Tippett and Hartley test; BA = Bartlett statistic; W = Levene test statistic.

Table 7

Empirical p-values for logistic distribution

Test(n1, n2, n3)(σ1, σ2, σ3)

(1, 1, 1)(1, 1, 1.2)(1, 1, 1.4)(1, 1, 1.6)(1, 1, 1.8)(1, 1, 2)
TC(15, 15, 15)0.04870.09740.22020.37550.52740.6521

TC(15, 20, 25)0.04920.09920.26040.45780.62960.7620

LR = likelihood ratio; FC = Fisher combination; LC = Liptak combination; TC = Tippett combination; THA = Tippett and Hartley test; BA = Bartlett statistic; W = Levene test statistic.

Table 8

Empirical p-values for uniform distribution

Test(n1, n2, n3)(σ1, σ2, σ3)

(1, 1, 1)(1, 1, 1.2)(1, 1, 1.4)(1, 1, 1.6)(1, 1, 1.8)(1, 1, 2)
TC(15, 15, 15)0.04680.15440.43930.72130.88140.9528

TC(15, 20, 25)0.04880.19950.57650.86600.96650.9929

LR = likelihood ratio; FC = Fisher combination; LC = Liptak combination; TC = Tippett combination; THA = Tippett and Hartley test; BA = Bartlett statistic; W = Levene test statistic.

  1. Bartlett, MS (1937). Properties of sufficiency and statistical tests. Proceedings of the Royal Statistical Society, Series A. 160, 268-282.
  2. Boos, DD, and Zhang, J (2000). Monte Carlo evaluation of resampling-based hypothesis tests. Journal of American Statistical Association. 95, 486-492.
  3. Brown, MB, and Forsythe, AB (1974). Robust tests for the equality of variances. Journal of the American Statistical Association. 69, 364-367.
  4. Chang, CH, Pal, N, and Lin, JJ (2017). A revisit to test the equality of variances of several populations. Communications in Statistics-Simulation and Computation. 46, 6360-6384.
  5. Gokpinar, E, and Gokpinar, F (2017). Testing equality of variances for several normal populations. Communications in Statistics-Simulation and Computation. 46, 38-52.
  6. Good, P (2000). Permutation Tests, A Practical Guide to Resampling Methods for Testing Hypotheses. New York: Springer
  7. Hand, HA, and Nagaraja, HN (2003). Order Statistics. New York: Wiley
  8. Hartley, HO (1950). The Use of Range in Analysis of Variance. Biometrika. 37, 271-280.
    Pubmed CrossRef
  9. Jayalath, KP, Ng, HKT, Manage, AB, and Riggs, KE (2017). Improved tests for homogeneity of variances. Communications in Statistics-Theory and Methods. 46, 7423-7446.
  10. Levene, H (1960). Robust tests for equality of variances. Contributions to Probability and Statistics: Essays in Honor of Harold Hotelling, Olkin, Ingram, and Hotelling, Harold, ed: Stanford University Press, pp. 278-292
  11. Lowe, B (1935). Data from the Iowa Agricultural Experiment Station
  12. O섳rien, RG (1979). A general ANOVA method for robust tests of additive models for variance. Journal of American Statistical Association. 74, 877-880.
  13. O섳rien, RG (1981). A simple test for variance effects in experimental designs. Psychological Bulletin. 89, 570-574.
  14. Oden, NL (1991). Allocation of effort in Monte Carlo simulation for power of permutation tests. Journal of American Statistical Association. 86, 1074-1076.
  15. Park, HI (2015). Simultaneous test for the mean and variance with an application to the statistical process control. Journal of Statistical Theory and Practice. 9, 868-881.
  16. Park, HI (2017). On tests detecting difference in means and variances simultaneously under normality. Communications in Statistics-Theory and Methods. 46, 10025-10035.
  17. Park, HI (2018). A note on the test for the covariance matrix under normality. Communications for Statistical Applications and Methods. 25, 71-78.
  18. Pearson, ES, and Hartley, HO (1970). Biometrika Tables for Statisticians
  19. Pesarin, F (2001). Multivariate Permutation Tests with Applications in Biostatistics. West Sussex, England: Wiley
  20. Piepho, HP (1996). A Monte Carlo test for variance homogeneity in linear models. Biometrical Journal. 38, 461-473.
  21. Snedecor, GW, and Cochran, WG (1989). Statistical Methods: Iowa State University Press
  22. Westfall, PH, and Young, SS (1993). Resamplimg-Based Multiple Testing, Examples and Methods for p-Value Adjustment. New York: Wiley