search for

CrossRef (0)
A comparative study of the Gini coefficient estimators based on the regression approach
Communications for Statistical Applications and Methods 2017;24:339-351
Published online July 31, 2017
© 2017 Korean Statistical Society.

Shahryar Mirzaeia, Gholam Reza Mohtashami Borzadaran1,b, Mohammad Aminib, and Hadi Jabbarib

aDepartment of Statistics, Payame Noor University, Iran, bDepartment of Statistics, Ferdowsi University of Mashhad, Iran
Correspondence to: Department of Statistics, Ordered and Spatial Data Center of Excellence, Faculty of Mathematical Sciences, Ferdowsi University of Mashhad, Mashhad, Iran. E-mail: grmohtashami@um.ac.ir
Received December 3, 2016; Revised May 1, 2017; Accepted June 28, 2017.

Resampling approaches were the first techniques employed to compute a variance for the Gini coefficient; however, many authors have shown that an analysis of the Gini coefficient and its corresponding variance can be obtained from a regression model. Despite the simplicity of the regression approach method to compute a standard error for the Gini coefficient, the use of the proposed regression model has been challenging in economics. Therefore in this paper, we focus on a comparative study among the regression approach and resampling techniques. The regression method is shown to overestimate the standard error of the Gini index. The simulations show that the Gini estimator based on the modified regression model is also consistent and asymptotically normal with less divergence from normal distribution than other resampling techniques.

Keywords : bootstrap technique, Gini coefficient, jackknife method, Lorenz curve, modified regression model, resampling techniques
1. Introduction

Measures of inequality are used to analyzing incomes, welfare, and poverty issues. They can also be helpful to measure the level of social stratification and polarization. Among many inequality measures, the Gini index is of the most widely known measures of income inequality due to its easy interpretation.

The Gini coefficient is the popular measure of income inequality; however, it is usually reported without any information about variance and ideally, with a standard error (SE). The reason for this is that most of the formulations of its proposed SE are mathematically complex or require considerable numerical computation.

Different methods to compute the SE of Gini index have been shown in various research on statistics and economics. Some related references are Woo and Yoon (2001), Xu (2007), Davidson (2009), Langel and Tillé (2013), and Yitzhaki and Schechtman (2013). Many authors have proposed using resampling approaches such as jackknife and bootstrap techniques to compute variance for the Gini estimator (Kang and Cho, 1996; Mills and Zandvakili, 1997; Moran, 2006; Palmitesta and Provasi, 2006; Yitzhaki, 1991).

Some authors have demonstrated that estimates of the Gini coefficient can be obtained from ordinary linear regression based on data and their ranks, thereby providing an exact analytic SE. Lerman and Yitzhaki (1984) derived a convenient method to calculate the Gini coefficient, using the covariance between data and corresponding ranks. In this way, Shalit (1985) proposed a regression model based on a ranked variable for the set of natural numbers to compute the Gini index. Ogwang (2000) provided a method to compute the Gini index by an ordinary least square (OLS) regression, as well as discussed how to use the regression to simplify the computation of the jackknife SE. Giles (2004) later showed that the OLS SE from this regression could be used directly in order to compute the SE of the Gini index. Modarres and Gastwirth (2006) with simulations showed that the regression method overestimates the SE of the Gini estimator. They noted that the defect is due to the dependency between error terms in the proposed regression model. They also recommend complex or computationally intensive methods. However, Ogwang (2006) and Giles (2006) disagreed with this suggestion.

In this paper, according to simplicity of the regression approach method, we evaluate and compare this way with other resampling techniques. In this regard, we examine some special situations where the underlying distribution follows log-logistic (LL) and exponential (Exp) distributions as particular cases of generalized Beta II distribution (McDonald, 1984). In the next section, we introduce the concept of the Gini index which is discussed the income inequality. Section 3 then deals with resampling techniques such as bootstrap and jackknife, to measure the SE and confidence interval of the Gini estimator. In Section 4, with discussing shortcoming of the regression approach, we study this method for the variance estimation of the Gini estimator. Section 5 provides simulation evidence that proves the main conclusions of the paper and compares the some inferential statistics of Gini estimators such as consistency, divergence from normal distribution and asymptotic properties among the methods. Some graphical comparisons have also been done. Section 6 illustrated the results of the paper for real data on income inequality in Britain between 1994/1995. Conclusions are left for the last section.

2. The Gini coefficient

The most well-known member of the income inequality family is the Gini coefficient. It is widely used to measure income inequality because of its clear economic interpretation. This measure can be defined in various ways (Xu, 2003; Yitzhaki, 1998).

The best known definition of the Gini index of inequality is as twice the area between the 45°-line (equality line) and the Lorenz curve, as demonstrated in Figure 1. Therefore, it can be expressed as


such that p = F(x) is a cumulative distribution function (cdf) of income and L(p) is the Lorenz function given by (1/μ)0pF-1(t)dt (Gastwirth, 1971), where μ = E(X) > 0 and F−1(p) = inf{x|F(x) ≥ p; p ∈ [0, 1]}.

The Gini index takes values between 0 and 1. The value 0 corresponds to complete equality and 1 corresponds to complete inequality. This index is also defined as


where Δ=00x-ydxdy is the population mean difference.

Using Expression (2.1), integration by parts and applying a change of variable p = F(x), it can be found that (Xu, 2003):


This formula clearly shows an interpretation of the Gini coefficient in terms of covariance via


Suppose that an identical independent distribution (iid) sample of size n is drawn randomly from the population, and let its empirical distribution function denoted as n. It can be noticed that using (2.2), the natural plug in estimator of G by estimating F with n defiend as


In the context of discrete income distribution, Xu (2003) showed that the Gini index can be estimated by


where 0 ≤ Y1 ≤ ·· · ≤ Yn are the ordered income data.

Davidson (2009) proposed an approximate expression for the bias of Ĝ, from which he subsequently derived the following bias-corrected estimator of the Gini coefficient, denoted , which is given by:


while the estimator is still biased but its bias is of order n−1. Sometimes using this estimator is recommended because the properly bias-corrected estimator is not only even easier to compute rather than the other estimators, but also its bias converges to 0 as n→∞faster.

3. Resampling techniques

Different methods with complicated formula to variance estimation of the Gini index have prompted significant research in statistics and economics. Most of the formulations of the variance for the Gini estimator proposed in the literature are mathematically complex or require considerable numerical computation. To avoid these mathematical difficulties, various authors have proposed using the resampling techniques such as jackknife and bootstrap methods as follows.

3.1. The jackknife method

The jackknife provides a general purpose approach to estimating the bias and variance of an estimator. The jackknife is particularly useful when standard methods for computing bias and variance cannot be applied or are difficult to apply. Suppose that Ĝ is an estimator of the Gini coefficient (G) based on sample of iid random variables X1, …, Xn. Let Ĝ(i) be the Gini estimator for the subsample of the initial sample where the ith observation has been deleted, then the jackknife estimator to measure the Gini coefficient based on the n values of Ĝ(i) is defined as:




which is the jackknife bias estimator (Yitzhaki, 1991). It follows that ĜJ will be asymptotically unbiased (Jiang, 2010).

The jackknife method can also be used to estimate the SE of the Gini estimator. This was first noted by Yitzhaki (1991), whose SE estimate has the following expression:




3.2. The Bootstrap method

The bootstrap method is an alternative approach to the variance estimation of Gini estimator. This technique is relatively straightforward, yet analytically powerful. Mathematical justifications can be quite sophisticated, the bootstrap method requires no theoretical calculations, applies identically to any income inequality measure, and is available no matter how mathematically complicated the parameter estimate or its asymptotic SE may be (Mills and Zandvakili, 1997). The bootstrap procedure is:

  • Given a sample X1, …, Xn of size n and an estimator Ĝ.

  • Draw B bootstrap samples of size n with replacement from X1, …, Xn.

  • Calculate the estimator for each one of them and obtain B values of the estimator, denoted by G^1*,,G^B*.

Now, these values are used in order to estimate the variance of the original estimator. Namely, the sample variance of G^1*,,G^B* is used as the bootstrap variance estimator of the variance of the original statistic Ĝ. The bootstrap SE of Ĝ can then be estimated as:



4. The regression method

The regression approach is the simplest way of computing a SE for the Gini coefficient. The computational difficulties or mathematical complexities associated with conventional formulas to compute the variance of the Gini coefficient make the use of simpler regression-based approach seems attractive. At first Lerman and Yitzhaki (1984) stated the Gini coefficient based on a covariance between the variable and its rank. Shalit (1985) proposed a regression model to compute the Gini index in the following form:

yi=α+βi+ξi,         i=1,2,,n,

such that 0 ≤ y1y2 ≤ · · · ≤ yn and ξ1, ξ2, …, ξn are errors with zero mean and homogenous variance σ2. He calculated the Gini estimator based on the estimated slope of this regression model as follows:


where is the sample mean, β̂ is the OLS estimator of β. To see an alternative regression interpretation of the Gini index, Ogwang (2000) also showed that Ĝ can be written as:


such that θ^=Σi=1niyi/Σi=1nyi, is the weighted least squares estimator of θ in the following regression model:

iyi=θyi+vi,         i=1,2,,n,

where v1, v2, …, vn are errors with mean zero and homogenous variance σ2. Davidson (2009) proposed regression modified model in (4.4) as:

(2i-1n-1)yi=θ*yi+ξi,         i=1,2,,n.

In this model, the Gini coefficient has an estimator of the slope of the regression model directly obtained and the SE of the Gini estimate can be computed.

4.1. Shortcomings of the regression model approach

The regression model to estimating the SE of the Gini estimator cannot produce reliable results because it does not account for potential shortcomings introduced in the following:

  • The regression model takes no account of the sampling design. It is used only for a random sampling technique.

  • The independent variable in ordinary linear regression model is measured with no error; however, this model assumes that the independent variable is random.

  • The normality of the error terms may be not satisfied and it should be tested.

  • The relationship between ordered income and its rank is convex as demonstrated in Figure 2. Therefore, the error terms are dependent because the variance-covariance matrix of the errors is not diagonal. It has nonnegative element σi j/n, such that

    σij=pi(1-pj)f(ζpi)f(ζpj),         ij,i,j=1,2,,n,


    σij=σji,         i>j,

    where ζpj is the pjth population quantile and f (ζpj ) is the positive and continuous density-quantile function for 0 < pj < 1 (David and Nagaraja, 2003).

It is important to note that actual data have these defects. Therefore, the method is used even though the defects are true. We must be cautious when using a regression-based approach to construct a SE for the Gini coefficient; in addition, tests for the validity of assumptions in the regression model must be formally conducted.

4.2. Asymptotic behavior based on regression approach

We can suppose that the estimator of Gini coefficient is unbiased and the asymptotic properties can be concluded if the regression model in (4.5) is true. Based on modified regression model in (4.5), the slope estimate of the regression model is equal to the Gini estimator is:


The Gini estimator in (4.6) is the proportion of two functions that are linear combination of order statistics (L-statistics); therefore, asymptotical normality and consistency can be obtained with the theory of L-statistics (Sendler, 1979). Ĝ is normally distribution based on theory of U-statistics and under some regularity conditions (Xu, 2007).

5. Simulation study

To comparative study of the Gini coefficient estimators based on bootstrap, jackknife and the regression approach in equation (4.5), we examined a special situation where the underlying distributions follow generalized Beta II (GB2) distribution. The most general distribution which has been proposed for fitting income data is the GB2 introduced in McDonald (1984). Its density is:


with x > 0, a, b > 0 and where B(p, q) is the Beta function given by

B(p,q)=Γ(p)Γ(q)Γ(p+q)=0tp-1(1+t)p+qdt,         p,q>0.

The GB2 has the advantage that many densities can be obtained as a particular case and therefor constitutes a nice framework for discussion. Because of the complexity of the mathematical expression, we concentrate on Exp distribution with cdf:

F(x)=1-e-x,         x>0,

and LL distribution with cdf:

F(x)=1-11+xa,         x>0,

where a > 0, is a shape parameter (a > 2 leads to existence of the second moment of X).

In each stage, the simulations were undertaken by drawing 10,000 independent samples of size n = 10, 20, 50, 100, 500, 1,000 from the underlying distributions.

5.1. Variance inflation of Gini estimator in regression approach

It is important to have a correct method available to compute the SE of the Gini coefficient. Using a hypothesis test, it is possible when we use the SE of one method; the difference is significant but it is not when using another method. In Table 1, we write the values of the SE for Gini estimator with the regression model and other resampling techniques under Exp and LL distributions with parameter a = 5 as a benchmark. From these values, it is clear that the regression method underestimate the true value of Gini index and also overestimates the SE of the Gini estimator rather than other resampling techniques.

Three methods, the bootstrap, jackknife and the regression approach, provide comparable results. SE estimates using the bootstrap and jackknife are close to the real values, while those of the suggested regression model are noticeably inflated. The weakness of the regression based approach is essentially a finite sample matter, and its importance should diminish as n → ∞. Figure 3 shows corresponding results for Exp distribution.

Table 2 reports the bias values, SE and mean square error (MSE) for the Gini estimator with the regression model underlying the LL distribution with parameter a = 3, 10, 20. It is clear that the change of these properties is influenced by the sample size and the values of a.

5.2. Consistency of Gini estimator in regression model

Table 3 presents the values of the MSE for Gini index with regression model underlying Exp and LL distributions with a = 3 as a benchmark. The results are shown in the following Table obtained by drawing 10,000 independent samples for the Gini index and n = 10, 20, 50, 100, 500, 1,000. The simulations show that the Gini coefficient based on parameter estimate in the modified regression model is consistent.

5.3. Asymptotic behavior

First, in order to see whether the asymptotic normality assumption yields a good approximation, simulations are undertaken with drawings 10,000 independent samples of size n = 100 for the Gini index from the Exp distribution, with cdf in (5.1). Using normality tests, it is evident that the consistency of the Gini estimator with the normal distribution is high.

In comparison normality of estimators based on jackknife and the regression approach, Figure 4 shows the empirical distributions for n = 100 of the statistics τReg = ( ĜG0)/σ̂Reg and τJack = (ĜJG0)/σ̂Jack. Here σ̂Reg is the regression SE estimator from regression (4.5) and σ̂Jack is the jackknife SE estimator from (3.3) and ĜJ is given by (3.1). Note that G0, the true value of the Gini index, for Exp distribution is 0.5.

Both of these statistics have distributions that are close to the standard normal distribution, but the jackknife estimator has better behavior of normality than the regression estimator.

5.4. Gini regression estimate deviation from the normal distribution

In this section, we examine the divergence of Gini estimator in regression approach to normal distribution using the Kolmogorov distance under LL distribution with parameter a. The results are shown in Figure 5, where we show the Kolmogorov distances of sampling indices from a normal distribution influenced by parameter values of the distribution and sample size values. The concave curves express that the parameter value has the opposite effect.

Here, we compare the deviation of Gini estimates from normal distribution. Figure 6 explains divergence from normal distribution for jackknife, bootstrap and the regression approach estimators under Exp distribution. It is evident that the deviation of these estimates are asymptotically equivalent (especially for jackknife and bootstrap). It can be seen that the Gini estimate based on the jackknife method has a good performance in deviation from normal distribution.

5.5. Comparison of confidence intervals

In this section, we proposed three 95% confidence intervals for the Gini estimates of Exp distribution. Recall that the true value of the Gini index for this distribution is 0.5. The first column of Table 4 uses SE of the jackknife with N(0, 1) critical values, the second is based on the percentile-t bootstrap confidence interval (Mills and Zandvakili, 1997), the third are confidence intervals based on the regression model.

It is evident that the asymptotic bootstrap and jackknife intervals are very similar and both are very much narrower than those computed with the SEs based on regression approach.

6. Empirical illustration

Here, we refer to the real data based on income inequality in Britain on the fiscal years 1994–1995. Estimation is based on the unit record data used to calculate the official income statistics, derived from Family Resources Surveys available from the UK Data Service’s web site ( https://www.ukdataservice.ac.uk). We first performed an analysis on a comparison of the SEs and three 95% confidence intervals. Table 5 provides the corresponding results.

According to probability plots and quantile plots, we fit the GB2 distribution to real data with scale parameter equal to b = 227.84 and shape parameters equal to a = 2.99, p = 1, and q = 1. This result has also been reported in Jenkins (2009). The parameters considered are the maximum likelihood estimates of the GB2 distribution based on incomes. The distribution was well-characterized by a Fisk distribution with a = 2.99 and b = 227.84. Here, we refer to Monte Carlo samples drawn from a Fisk distribution as well as performed an analysis on a comparison of the divergence of Gini estimates from normal distribution by using the Kolmogorov distance. Table 6 summarizes the results for better interpretation.

Table 7 presents the empirical MSEs of the Gini estimator based on resampling and regression model. It is clear that all of these methods have asymptotically equal MSE values.

Table 8 reports the relative frequencies of the 95% confidence intervals containing the true value of the inequality measure (coverage probability) and the sizes of confidence intervals (average size) of 10,000 confidence intervals, for different methods of the fitted distribution to real data.

The results show that the coverage accuracy of the resampling techniques confidence intervals is reasonably close to the nominal confidence level for a large sample size. As expected, there was no substantial difference in coverage probability (CP) and in average size (AS) for the two resampling techniques. The jackknife confidence interval performs best in terms of CP and AS at the cost of providing larger confidence intervals.

7. Conclusion

The regression approach is the simplest way to estimate the Gini coefficient and its SE; however, this analysis of the Gini estimator can produce weaker results because it does not account for potential shortcomings introduced in the proposed regression model. It is important to note that actual data has defects when this method is used, despite the known shortcomings. The Gini estimator based on the regression model is consistent and asymptotically normal with less divergence from normal distribution than other resampling techniques. This method does not require the grouping of individual data to economize on computations. In addition, the estimator can be analyzed by using standard statistical software. The weakness of this method decreases as the sample size grows; therefore, we should be cautious when using regression-based approach to analyze the Gini coefficient in small samples size.

Fig. 1. The area between the equality-line and the Lorenz curve.
Fig. 2. The dependency between ordered incomes under exponential distribution.
Fig. 3. Comparison of the standard errors under exponential distribution.
Fig. 4. Comparison of the empirical distributions of jackknife and regression statistics.
Fig. 5. The trend of Kolmogorov distances with respect to the values of parameter a.
Fig. 6. The divergence from N(0, 1) for jackknife, bootstrap, and regression approach.

Table 1

Comparison the standard errors of Gini estimates



LL = loglogistic distribution; Exp = exponential distribution; Boot = bootstrap; Jack = jackknife; Reg = regression.

Table 2

Summary of regression approach under loglogistic distribution

anBiasStandard errorMean square error



Table 3

Consistency of Gini estimator in regression model for exponential distribution



MSE = mean square error; Exp = exponential distribution; LL = loglogistic distribution.

Table 4

Confidence intervals for Gini estimates of exponential distribution

10(0.3222, 0.6363)(0.2626, 0.6794)(0.1596, 0.7399)
20(0.3735, 0.6012)(0.3497, 0.6265)(0.2758, 0.6747)
50(0.4199, 0.5682)(0.4115, 0.5799)(0.3662, 0.6138)
100(0.4437, 0.5503)(0.4395, 0.5570)(0.4085, 0.5822)
500(0.4748, 0.5239)(0.4739, 0.5251)(0.4605, 0.5377)
1,000(0.4820, 0.5174)(0.4816, 0.5177)(0.4723, 0.5270)

Table 5

Comparison the Gini estimates in Britain in fiscal years 1994–1995

MethodĜStandard error(Ĝ)Confidence Interval
Bootstrap0.331850.00272[0.32578, 0.33715]
Jackknife0.331860.00190[0.32623, 0.33750]
Regression0.331850.00337[0.32524, 0.33846]

Table 6

Divergence of Gini estimates from N(0, 1) based on fitted distribution


Table 7

MSEs of the Gini estimates based on fitted distribution



MSE = mean square error.

Table 8

CP and AS of fitted distribution to real data



CP = coverage probability; AS = average size.

  1. David, HA, and Nagaraja, HN (2003). Order Statistics. New York: John & Wiley
  2. Davidson, R (2009). Reliable inference for the Gini index. Journal of Econometrics. 150, 30-40.
  3. Gastwirth, JL (1971). A general definition of the Lorenz curve. Econometrica. 39, 1037-1039.
  4. Giles, DEA (2004). Calculating a standard error for the Gini coefficient: some further results. Oxford Bulletin Economics and Statistics. 66, 425-433.
  5. Giles, DEA (2006). A cautionary note on estimating the standard error of the Gini index of inequality: comment. Oxford Bulletin Economics and Statistics. 68, 395-396.
  6. Jenkins, SP (2009). Distributionally-sensitive inequality indices and the GB2 income distribution. Review of Income and Wealth. 55, 392-398.
  7. Jiang, J (2010). Large Sample Techniques for Statistics. New York: Springer Science
  8. Kang, SB, and Cho, YS (1996). Estimation of Gini index of the exponential distribution by bootstrap method. Communications for Statistical Applications and Methods. 3, 291-297.
  9. Langel, M, and Till챕, Y (2013). Variance estimation of the Gini index: revisiting a result several times published. Journal of the Royal Statistical Society Series A (Statistics in Society). 176, 521-540.
  10. Lerman, RI, and Yitzhaki, S (1984). A note on the calculation and interpretation of the Gini index. Economics Letters. 15, 363-368.
  11. McDonald, JB (1984). Some generalized functions for the size distribution of income. Econometrica. 52, 647-665.
  12. Mills, JA, and Zandvakili, S (1997). Statistical inference via bootstrapping for measures of inequality. Journal of Applied Econometrics. 12, 133-150.
  13. Modarres, R, and Gastwirth, JL (2006). A cautionary note on estimating the standard error of the Gini index of inequality. Oxford Bulletin Economics and Statistics. 68, 385-390.
  14. Moran, TP (2006). Statistical inference for measures of inequality with a cross-national bootstrap application. Sociological Methods & Research. 34, 296-333.
  15. Ogwang, T (2000). A convenient method of computing the Gini index and its standard error. Oxford Bulletin Economics and Statistics. 62, 123-129.
  16. Ogwang, T (2006). A cautionary note on estimating the standard error of the Gini index of inequality: comment. Oxford Bulletin Economics and Statistics. 68, 391-393.
  17. Palmitesta, GMGP, and Provasi, C (2006). Asymptotic and bootstrap inference for the generalized Gini indices. Metron. 64, 107-124.
  18. Sendler, W (1979). On statistical inference in concentration measurement. Metrika. 26, 109-122.
  19. Shalit, H (1985). Calculating the Gini index of inequality for individual data. Oxford Bulletin Economics and Statistics. 47, 185-189.
  20. Woo, JS, and Yoon, GE (2001). Estimations of Lorenz curve and Gini index in a Pareto distribution. Communications for Statistical Applications and Methods. 8, 249-256.
  21. Xu, K (2003). How has the literature on Gini셲 index evolved in the past 80 years?. China Economic Quarterly. 3.
  22. Xu, K (2007). U-statistics and their asymptotic results for some inequality and poverty measures. Econometric Reviews. 26, 567-577.
  23. Yitzhaki, S (1991). Calculating jackknife variance estimators for parameters of the Gini method. Journal of Business and Economic Statistics. 9, 235-239.
  24. Yitzhaki, S (1998). More than a dozen alternative ways of spelling Gini. Research in Economic Inequality. 8, 13-30.
  25. Yitzhaki, S, and Schechtman, E (2013). The Gini Methodology: A Primer on a Statistical Methodology. New York: Springer Science