TEXT SIZE

search for



CrossRef (0)
Multivariate confidence region using quantile vectors
Communications for Statistical Applications and Methods 2017;24:641-649
Published online November 30, 2017
© 2017 Korean Statistical Society.

Chong Sun Hong1,a, and Hong Il Kima

aDepartment of Statistics, Sungkyunkwan University, Korea
Correspondence to: 1Corresponding author: Department of Statistics, Sungkyunkwan University, 25-2 Sungkyunkwan-ro, Jongno-gu, Seoul 03063, Korea. E-mail: cshong@skku.edu
Received July 12, 2017; Revised September 20, 2017; Accepted October 16, 2017.
 Abstract

Multivariate confidence regions were defined using a chi-square distribution function under a normal assumption and were represented with ellipse and ellipsoid types of bivariate and trivariate normal distribution functions. In this work, an alternative confidence region using the multivariate quantile vectors is proposed to define the normal distribution as well as any other distributions. These lower and upper bounds could be obtained using quantile vectors, and then the appropriate region between two bounds is referred to as the quantile confidence region. It notes that the upper and lower bounds of the bivariate and trivariate quantile confidence regions are represented as a curve and surface shapes, respectively. The quantile confidence region is obtained for various types of distribution functions that are both symmetric and asymmetric distribution functions. Then, its coverage rate is also calculated and compared. Therefore, we conclude that the quantile confidence region will be useful for the analysis of multivariate data, since it is found to have better coverage rates, even for asymmetric distributions.

Keywords : coverage rate, ellipse, ellipsoid, quantile vector
1. Introduction

For multivariate data with dimensionality equal to or more than two, the multivariate confidence region (MCR) of the population mean vector μ is defined with the theory that the well-known statistic, (μ)T−1(μ), follows a chi-squared distribution function with some degree of freedom. This is explained under the assumption that the multivariate date follows the normal distribution function (Chew, 1966; Fan and Zhang, 2000; Frank, 1966; Johnson and Whichern, 2002; Sun and Loader, 1994). The (1 − α) confidence regions for the bivariate distribution function are represented with circular or elliptical shapes with respect to values of the correlation coefficient ρ, and the confidence regions for the trivariate distribution function are expressed with spherical or ellipsoid shapes with respect to the types of variance and covariance matrix.

In the real world, most data does not satisfy the normality assumption. Therefore, it is not easy to define and obtain the MCR of the population mean vector frequently (see Asgharzadeh and Abdi (2011) for more detail). In this work, an alternative confidence region using multivariate quantile vectors is proposed to define the normal distribution as well as any other distribution functions.

When the multivariate random vector Z = (Z1, …, Zk) is considered, Hong et al. (2016) proposed that for a given α ∈ (0, 1), the multivariate α quantile vector zα = (z1, …, zk) is defined as follows:

  • The cumulative distribution function (cdf) of any point in the multivariate α quantile vector zα has the same value. For example, consider the bivariate cdf graph whose horizontal plane is represented as (Z1, Z2) coordinates, and vertical axis is for cdf values. Then one can select the zα = (z1, z2) coordinates corresponding to the α cdf value, which are parallel to the horizontal plane and represented as the curve. Any point in this curve has an equivalent value of the cumulative distribution function that also notes trivariate quantile vectors represented as the surface shape.

  • The probability for the upper region, Rαk, of the multivariate α quantile vector zα is equal to 1 − α. That is, P[Z_=(Z1,,Zk)Rαk]=(z1,,zk)RαkdF(z1,,zk)=1-α,

    where F( ·, …, · ) is a k-variate distribution function.

These lower and upper bounds of an alternative confidence region could be obtained by using the quantile vectors, and then the appropriate region between two bounds is denoted as the quantile confidence region (QCR). The existing ellipse and ellipsoid type of confidence region is now referred to as MCR.

Both the MCR and QCR are obtained for various types of distribution functions that are both symmetric and asymmetric distribution functions. Then, we will compare the coverage rates of the MCR and QCR.

In Section 2, the procedures to obtain the QCR are explained in terms of the multivariate quantile vectors proposed by Hong et al. (2016). Several QCRs are demonstrated for bivariate and trivariate standard normal distribution functions with some values of correlation coefficients. To discuss the properties between the QCR and MCR, the coverage rates of the MCR and OCR are obtained and compared for multivariate normal distribution functions with various values of correlation coefficients in Section 3; in addition, the corresponding coverage rates for both the MCR and QCR are calculated in Section 4 and discussed for multivariate symmetric distribution functions and asymmetric distribution functions in terms of many kinds of variance and covariance matrix types. Finally, Section 5 summarizes the results of this study and discusses further research.

2. Quantile confidence region

The 1 − α QCR is derived with the following three steps using the multivariate quantile vectors of Hong et al. (2016).

  • For a given α ∈ (0, 1), the lower and upper bounds in k-variate random sample are obtained from α/2 and 1−α/2 quantile vectors, zα/2 and z1−α/2, respectively. For a bivariate distribution function, zα/2 and z1−α/2 can be obtained like the left one in Figure 1, and the right graph in Figure 1 is the corresponding two sets of vectors drawn on a two dimensional plan.

  • For any positive constant value Δ and point (z1,α/2, …, zk,α/2) in multivariate α quantile vector zα, we find the smallest zi0,α/2 to satisfy -z1,α2-zi0,α2-zk,α2dF(z1,,zk)=-z1,α2-zi0,α2+Δ-zk,α2dF(z1,,zk),

    for each i = 1, …, k. Then one can determine the end point vector (z10,α/2, …, zi0,α/2, …, zk0,α/2) in zα for all i. We may determine that the lower bound, {ZL,α}, of the α QCR is defined as {(z1,α/2, …, zk,α/2)T }, where each value zi,α/2zi0,α/2 for all i.

    And for any point (z1,1−α/2, …, zk,1−α/2) in multivariate 1 − α quantile vector z1−α, we obtain the smallest zi0,1−α/2 (i = 1, …, k) to satisfy -z1,1-α2-zi0,1-α2-zk,1-α2dF(z1,,zk)=-z1,1-α2-zi0,1-α2+Δ-zk,1-α2dF(z1,,zk).

    For all i = 1, …, k, one can determine the end point vector (z10,1−α/2, …, zi0,1−α/2, …, zk0,1−α/2) in z1−α. Then the upper bound, {ZU,α}, of the α QCR is defined as {(z1,1−α/2, …, zk,1−α/2)T }, where each value zi,1−α/2zi0,1−α/2 for all i.

    That is, the lower bound, {ZL,α}, and upper bound, {ZU,α}, of the α QCR can be explained as the truncated bounds based on the end point vector (z10,α/2, …, zi0,α/2, …, zk0,α/2), and (z10,1−α/2, …, zi0,1−α/2, …, zk0,1−α/2), respectively. For a bivariate distribution function, two truncated bounds of the bivariate QCR look like the left one in Figure 2.

  • The multivariate 1 − α QCR are defined as the space between the lower and upper bounds {ZL,α} = {(z1,α/2, …, zk,α/2)T }, and {ZU,α} = {(z1,1−α/2, …, zk,1−α/2)T }. For a bivariate distribution function, one can obtain the bivariate QCR like the right graph in Figure 2.

Two bivariate 0.90 QCR and MCR are illustrated in Figure 3 for a bivariate standard normal distribution function with the correlation coefficient ρ = −0.9, 0.0, and 0.9.

In Figure 3, the dotted and solid lines are represented for the MCR and QCR, respectively. The shapes of the change can be seen in both MCR and QCR according to values of ρ. When ρ is 0, the MCR has a circular shape. When ρ is positive (or negative), the MCR has an elliptical shape with a slope of 45 (or −45), and the ellipse becomes tighter as absolute values of ρ increase to 0.9. However, the distance between the two upper and lower bounds of the QCR gradually increases, and the curve becomes shorter as ρ increases −0.9 to 0.9. When ρ is negative, the distance between these bounds is shorter, and the slope of the curve becomes lower compared to those for ρ = 0. When ρ is positive, the distance between these bounds is longer, and the slope of the curve becomes steeper compared to those for ρ = 0.

3. Coverage rate comparison in the normal case

In order to compare the QCR proposed in this work with MCR, the corresponding coverage rates are obtained for various kinds of distribution functions and some significant levels. First, under the bivariate and trivariate normal distribution assumption, the coverage rates are calculated under the simulation situations.

3.1. Bivariate normal

Consider the bivariate standard normal distributions with zero mean vector, unit variance, and with correlation coefficient ρ. Both the MCR and QCR are derived for these distribution functions with ρ = −0.9 to 0.9 in increments of 0.45 and α = 0.10, 0.05. The sample means of size n = 1,000 are generated from the bivariate standard normal distributions with the variance-covariance matrix times 1/n. With these 1,000 sample means, the corresponding coverage rates are calculated as the number belonging to each confidence region with 10,000 iterations.

The coverage rates and standard errors are obtained for ρ = −0.9 to 0.9 in increments of 0.45 with α = 0.10 and 0.05 summarized in Table 1. From Table 1, it can be found that the coverage rates of the 90 and 95% MCR and QCR have similar values of 0.90 and 0.95, respectively. Each standard errors of the MCR and OCR also have very close values. Therefore, we can explore how the QCR has a very similar performance with MCR.

3.2. Trivariate normal

Consider the standard trivariate normal distribution function with zero mean vector and simple variance-covariance matrix, such as Σ=(1ρρ2ρ1ρρ2ρ1).

Both the MCR and QCR are derived for these trivariate standard normal distribution functions with various ρ and α. The sample means of size n = 1,000 are generated from the trivariate standard normal distributions with the variance-covariance matrix ∑/n. With these 1,000 sample means, the corresponding coverage rates are calculated as the number belonging to each confidence regions with 10,000 iterations. Coverage rates and standard errors are obtained for ρ = −0.9, 0, 0.9 with α = 0.1 and are summarized in Table 2.

Table 2 indicates that the coverage rates for the 90% MCR and QCR have almost the same values as 0.90, respectively. The standard errors of the MCR and OCR are also very close. These behaviors are similar to those of the bivariate normal distribution functions discussed in Section 3.1. Therefore, we can conclude that the QCR has very similar performance with the MCR for multivariate normal distribution functions.

4. Coverage rate comparison in non-normal case

The two confidence regions have similar coverage rates for normal distribution functions with various kinds of variance and covariance matrix and significant levels. Moreover, it can be assumed that the QCR has better advantages than the MCR if the QCR has better coverage rates than the MCR in non-normal distribution functions. In order to show that, we need to consider various kinds of the symmetric and asymmetric distribution functions. A large set of the sample means are generated and the coverage rates are calculated with analogous arguments in Section 3.

4.1. Symmetric mixture normal case

The following mixed normal distribution function is set as one multivariate distribution function that is not normal but symmetric with respect to the origin:

f(x)=0.5f1(x)+0.5f2(x),

where f1(x)=N((-1-1),(1ρρ1)),f2(x)=N((11),(1ρρ1)).

Both MCR and QCR are derived for these normal mixtures with ρ = −0.9 to 0.9 in increments of 0.45 and α = 0.10, 0.05. With these samples, the corresponding coverage rates are calculated as the number belonging to each confidence region with 10,000 iterations.

Figure 4 illustrates the bivariate 0.90 QCR and MCR for this symmetric normal mixture with the correlation coefficient ρ = −0.9, 0.0, and 0.9. In this normal mixture, the shapes of MCR and QCR in Figure 4 are different compared to the values of ρ. The shapes of the MCR become more slim as ρ increases to 0.9, and the pattern of this QCR becomes similar compared to QCR with normal distribution functions. However, the distance between the upper and lower bounds in this normal mixture is slightly longer than those for normal distribution functions.

Table 3 and Figures 5, 6 show that the coverage rates of the 90% and 95% QCR have similar values as 0.90 and 0.95, respectively; however, the coverage rates of the 90% and 95% MCR have bigger values than 0.90 and 0.95. The standard errors of the MCR are slightly less than the QCR. With these phenomena, we could say that the QCR has better performance than the MCR for symmetric distribution functions which is non-normal distribution functions.

4.2. Asymmetric mixture normal case

Another mixture distribution function is considered as an asymmetric bivariate distribution function which is normal as well as asymmetric about the mean:

f(x)=0.7f1(x)+0.3f2(x),

where f1(x)=N((-1-1),(1ρρ1)),f2(x)=N((11),(1ρρ1)).

Figure 7 illustrates the bivariate 0.90 QCR and MCR for this asymmetric normal mixture with the correlation coefficient ρ = −0.9, 0.0, and 0.9.

The shapes of the MCR in Figure 7 become much slimmer compared to those in Figure 4, and the patterns of the QCR in Figure 7 are a little different from those in Figure 4 for this asymmetric mixture as ρ increases to 0.9. It can be seen that each upper and lower bounds in this asymmetric mixture can have different curves. Table 3 shows that the coverage rates and standard errors are obtained for ρ = −0.9 to 0.9 in increments of 0.45 with α = 0.10 and 0.05. Their means and confidence intervals of the coverage rates are represented in Figures 8 and 9.

Table 4 and Figures 8, 9 tell us that the coverage rates of the 90% and 95% QCR have similar values as 0.90 and 0.95, respectively, but their coverage rates of the MCR have smaller values than the corresponding values. The standard errors of the MCR are larger than those of the QCR. With these phenomena obtained in Section 4.1 and 4.2, we can conclude that the QCR has better performance than the MCR for symmetric and asymmetric distribution functions that include non-normal distribution functions.

5. Discussion and further study

The QCR proposed in this study has four features. First, due to the curve and surface shapes of the bivariate and trivariate quantile vectors, the QCR has a non-ellipsoid shape regardless of its distribution. Second, the width of the QCR is longer than that of the MCR which has the ellipsoid type of confidence region in multivariate normal distribution. Third, the QCR does not satisfy the equivariance property as shown in the simulation studies of the normal mixture cases. Finally, if the distribution of the random sample is known, it is possible to obtain the QCR using its cdf and also easily obtain the MCR regardless of its distribution.

In this section, we will discuss further studies based on the features of the QCR explained above. One needs to develop other approaches to solve the shape of the non-ellipsoid QCR in multivariate normal distribution. Then we may ultimate the problem of the QCR which has a wider confidence region than the MCR. We have used the coverage rate as a criteria for the comparison of confidence region, since this is generally used in the simulation studies for the confidence region. Nonetheless, the QCR has non-ellipsoid form and does not satisfy the equivariance property; therefore, one needs to explore some advantages by using other criteria such as the power function of the confidence region.

The QCR needs to compensate for these weaknesses; however, we can say that this alternative method is valuable since the proposed confidence regions can easily be obtained and have good coverage rates for any kind of multivariate distribution.

6. Conclusion

Lots of multivariate data in the real world do not satisfy the normality assumption in many cases. In this situation, it is very hard to derive the MCR of the population mean vector. In this work, an alternative confidence region is proposed using multivariate quantile vectors. It has a good advantage that this confidence region could be defined for normal distribution as well as for any other distribution functions.

The confidence region using multivariate quantile vectors is obtained for various types of distribution functions that are both a symmetric and asymmetric distribution. Then, the coverage rates are also calculated and discussed. The QCR is found to have better coverage rates even for asymmetric distributions.

When we need to obtain the confidence region of the population mean vector using the multivariate quantile vectors for the real multivariate data, one must decide either normal or non-normal distribution functions corresponding to the data. If it turns out to be non-normal, then one can estimate the parameters of an appropriate mixture distribution function. These procedures can easily be performed using the R package ‘mixtool’. Therefore, the QCR proposed in this work is found to have better properties than the MCR; therefore, we conclude that the QCR would be very useful for multivariate data analysis.

Figures
Fig. 1. Two bivariate quantile vectors.
Fig. 2. Second and third steps for bivariate quantile confidence region.
Fig. 3. The 90% bivariate quantile confidence region and multivariate confidence region.
Fig. 4. The bivariate 0.90 multivariate confidence region and quantile confidence region for symmetric normal mixture.
Fig. 5. Coverage rates for 90% MCR and QCR. MCR = multivariate confidence region; QCR = quantile confidence region.
Fig. 6. Coverage rates for 95% MCR and QCR. MCR = multivariate confidence region; QCR = quantile confidence region.
Fig. 7. The bivariate 0.90 multivariate confidence region and quantile confidence region for asymmetric normal mixture.
Fig. 8. Coverage rates for 90% MCR and QCR. MCR = multivariate confidence region; QCR = quantile confidence region.
Fig. 9. Coverage rates for 95% MCR and QCR. MCR = multivariate confidence region; QCR = quantile confidence region.
TABLES

Table 1

Coverage rates for MCR and QCR

ρ90% MCR90% QCR95% MCR95% QCR




CoverageSECoverageSECoverageSECoverageSE
−0.90900.00849.498900.03659.421949.91946.812950.00566.885
−0.45900.03739.446900.29789.401949.93566.826949.77836.891
0.00900.04409.637900.15549.527950.02546.887950.03616.896
0.45900.10649.496900.04709.482949.88446.872949.79756.949
0.90900.03319.630899.86949.630949.98956.903950.01716.914

MCR = multivariate confidence region; QCR = quantile confidence region; SE = standard error.


Table 2

Coverage rates for 90% MCR and QCR

ρ90% MCR90% QCR


CoverageSECoverageSE
−0.9900.12149.348900.00379.315
0.0899.90879.488900.67099.417
0.9900.04529.630900.13209.634

MCR = multivariate confidence region; QCR = quantile confidence region; SE = standard error.


Table 3

Coverage rates for MCR and QCR

ρ90% MCR90% QCR95% MCR95% QCR




CoverageSECoverageSECoverageSECoverageSE
−0.90938.13457.416902.03849.613973.76765.078951.65316.712
−0.45928.73468.289901.10829.316969.11775.435951.31286.652
0.00919.45028.551900.76789.421964.48925.734950.72636.422
0.45913.25958.469900.14659.472960.82156.129950.22966.794
0.90908.45199.303900.04159.197958.27626.292949.84926.767

MCR = multivariate confidence region; QCR = quantile confidence region; SE = standard error.


Table 4

Coverage rates for MCR and QCR

ρ90% MCR90% QCR95% MCR95% QCR




CoverageSECoverageSECoverageSECoverageSE
−0.90846.941110.741902.13149.221925.09118.047954.52496.253
−0.45828.847111.379904.77479.378905.08718.631954.71356.124
0.00835.363311.153904.64399.311903.81289.203954.42016.143
0.45841.263611.214902.44239.218905.33119.192953.16516.754
0.90846.102811.322900.10119.139907.41399.205950.13716.398

MCR = multivariate confidence region; QCR = quantile confidence region; SE = standard error.


References
  1. Asgharzadeh, A, and Abdi, M (2011). Confidence intervals and joint confidence regions for the two-parameter exponential distribution based on records. Communications for Statistical Applications and Methods. 18, 103-110.
    CrossRef
  2. Chew, V (1966). Confidence, prediction, and tolerance regions for the multivariate normal distribution. Journal of the American Statistical Association. 61, 605-617.
    CrossRef
  3. Fan, J, and Zhang, W (2000). Simultaneous confidence bands and hypothesis testing in varying-coefficient models. Scandinavian Journal of Statistics. 27, 715-731.
    CrossRef
  4. Frank, O (1966). Simultaneous confidence intervals. Scandinavian Actuarial Journal. 1966, 78-89.
    CrossRef
  5. Hong, CS, Han, SJ, and Lee, GP (2016). Vector at risk and alternative value at risk. The Korean Journal of Applied Statistics. 29, 689-697.
    CrossRef
  6. Johnson, RA, and Wichern, DW (2002). Applied Multivariate Statistical Analysis. Upper Saddle River: Prentice Hall
  7. Sun, J, and Loader, CR (1994). Simultaneous confidence bands for linear regression and smoothing. The Annals of Statistics. 22, 1328-1345.
    CrossRef