Multivariate confidence regions were defined using a chi-square distribution function under a normal assumption and were represented with ellipse and ellipsoid types of bivariate and trivariate normal distribution functions. In this work, an alternative confidence region using the multivariate quantile vectors is proposed to define the normal distribution as well as any other distributions. These lower and upper bounds could be obtained using quantile vectors, and then the appropriate region between two bounds is referred to as the quantile confidence region. It notes that the upper and lower bounds of the bivariate and trivariate quantile confidence regions are represented as a curve and surface shapes, respectively. The quantile confidence region is obtained for various types of distribution functions that are both symmetric and asymmetric distribution functions. Then, its coverage rate is also calculated and compared. Therefore, we conclude that the quantile confidence region will be useful for the analysis of multivariate data, since it is found to have better coverage rates, even for asymmetric distributions.
For multivariate data with dimensionality equal to or more than two, the multivariate confidence region (MCR) of the population mean vector
In the real world, most data does not satisfy the normality assumption. Therefore, it is not easy to define and obtain the MCR of the population mean vector frequently (see Asgharzadeh and Abdi (2011) for more detail). In this work, an alternative confidence region using multivariate quantile vectors is proposed to define the normal distribution as well as any other distribution functions.
When the multivariate random vector
The cumulative distribution function (cdf) of any point in the multivariate
The probability for the upper region,
where
These lower and upper bounds of an alternative confidence region could be obtained by using the quantile vectors, and then the appropriate region between two bounds is denoted as the quantile confidence region (QCR). The existing ellipse and ellipsoid type of confidence region is now referred to as MCR.
Both the MCR and QCR are obtained for various types of distribution functions that are both symmetric and asymmetric distribution functions. Then, we will compare the coverage rates of the MCR and QCR.
In Section 2, the procedures to obtain the QCR are explained in terms of the multivariate quantile vectors proposed by Hong
The 1 −
For a given
For any positive constant value Δ and point (
for each
And for any point (
For all
That is, the lower bound, {
The multivariate 1 −
Two bivariate 0.90 QCR and MCR are illustrated in Figure 3 for a bivariate standard normal distribution function with the correlation coefficient
In Figure 3, the dotted and solid lines are represented for the MCR and QCR, respectively. The shapes of the change can be seen in both MCR and QCR according to values of
In order to compare the QCR proposed in this work with MCR, the corresponding coverage rates are obtained for various kinds of distribution functions and some significant levels. First, under the bivariate and trivariate normal distribution assumption, the coverage rates are calculated under the simulation situations.
Consider the bivariate standard normal distributions with zero mean vector, unit variance, and with correlation coefficient
The coverage rates and standard errors are obtained for
Consider the standard trivariate normal distribution function with zero mean vector and simple variance-covariance matrix, such as
Both the MCR and QCR are derived for these trivariate standard normal distribution functions with various
Table 2 indicates that the coverage rates for the 90% MCR and QCR have almost the same values as 0.90, respectively. The standard errors of the MCR and OCR are also very close. These behaviors are similar to those of the bivariate normal distribution functions discussed in Section 3.1. Therefore, we can conclude that the QCR has very similar performance with the MCR for multivariate normal distribution functions.
The two confidence regions have similar coverage rates for normal distribution functions with various kinds of variance and covariance matrix and significant levels. Moreover, it can be assumed that the QCR has better advantages than the MCR if the QCR has better coverage rates than the MCR in non-normal distribution functions. In order to show that, we need to consider various kinds of the symmetric and asymmetric distribution functions. A large set of the sample means are generated and the coverage rates are calculated with analogous arguments in Section 3.
The following mixed normal distribution function is set as one multivariate distribution function that is not normal but symmetric with respect to the origin:
where
Both MCR and QCR are derived for these normal mixtures with
Figure 4 illustrates the bivariate 0.90 QCR and MCR for this symmetric normal mixture with the correlation coefficient
Table 3 and Figures 5, 6 show that the coverage rates of the 90% and 95% QCR have similar values as 0.90 and 0.95, respectively; however, the coverage rates of the 90% and 95% MCR have bigger values than 0.90 and 0.95. The standard errors of the MCR are slightly less than the QCR. With these phenomena, we could say that the QCR has better performance than the MCR for symmetric distribution functions which is non-normal distribution functions.
Another mixture distribution function is considered as an asymmetric bivariate distribution function which is normal as well as asymmetric about the mean:
where
Figure 7 illustrates the bivariate 0.90 QCR and MCR for this asymmetric normal mixture with the correlation coefficient
The shapes of the MCR in Figure 7 become much slimmer compared to those in Figure 4, and the patterns of the QCR in Figure 7 are a little different from those in Figure 4 for this asymmetric mixture as
Table 4 and Figures 8, 9 tell us that the coverage rates of the 90% and 95% QCR have similar values as 0.90 and 0.95, respectively, but their coverage rates of the MCR have smaller values than the corresponding values. The standard errors of the MCR are larger than those of the QCR. With these phenomena obtained in Section 4.1 and 4.2, we can conclude that the QCR has better performance than the MCR for symmetric and asymmetric distribution functions that include non-normal distribution functions.
The QCR proposed in this study has four features. First, due to the curve and surface shapes of the bivariate and trivariate quantile vectors, the QCR has a non-ellipsoid shape regardless of its distribution. Second, the width of the QCR is longer than that of the MCR which has the ellipsoid type of confidence region in multivariate normal distribution. Third, the QCR does not satisfy the equivariance property as shown in the simulation studies of the normal mixture cases. Finally, if the distribution of the random sample is known, it is possible to obtain the QCR using its cdf and also easily obtain the MCR regardless of its distribution.
In this section, we will discuss further studies based on the features of the QCR explained above. One needs to develop other approaches to solve the shape of the non-ellipsoid QCR in multivariate normal distribution. Then we may ultimate the problem of the QCR which has a wider confidence region than the MCR. We have used the coverage rate as a criteria for the comparison of confidence region, since this is generally used in the simulation studies for the confidence region. Nonetheless, the QCR has non-ellipsoid form and does not satisfy the equivariance property; therefore, one needs to explore some advantages by using other criteria such as the power function of the confidence region.
The QCR needs to compensate for these weaknesses; however, we can say that this alternative method is valuable since the proposed confidence regions can easily be obtained and have good coverage rates for any kind of multivariate distribution.
Lots of multivariate data in the real world do not satisfy the normality assumption in many cases. In this situation, it is very hard to derive the MCR of the population mean vector. In this work, an alternative confidence region is proposed using multivariate quantile vectors. It has a good advantage that this confidence region could be defined for normal distribution as well as for any other distribution functions.
The confidence region using multivariate quantile vectors is obtained for various types of distribution functions that are both a symmetric and asymmetric distribution. Then, the coverage rates are also calculated and discussed. The QCR is found to have better coverage rates even for asymmetric distributions.
When we need to obtain the confidence region of the population mean vector using the multivariate quantile vectors for the real multivariate data, one must decide either normal or non-normal distribution functions corresponding to the data. If it turns out to be non-normal, then one can estimate the parameters of an appropriate mixture distribution function. These procedures can easily be performed using the R package ‘mixtool’. Therefore, the QCR proposed in this work is found to have better properties than the MCR; therefore, we conclude that the QCR would be very useful for multivariate data analysis.
Coverage rates for MCR and QCR
90% MCR | 90% QCR | 95% MCR | 95% QCR | |||||
---|---|---|---|---|---|---|---|---|
Coverage | SE | Coverage | SE | Coverage | SE | Coverage | SE | |
−0.90 | 900.0084 | 9.498 | 900.0365 | 9.421 | 949.9194 | 6.812 | 950.0056 | 6.885 |
−0.45 | 900.0373 | 9.446 | 900.2978 | 9.401 | 949.9356 | 6.826 | 949.7783 | 6.891 |
0.00 | 900.0440 | 9.637 | 900.1554 | 9.527 | 950.0254 | 6.887 | 950.0361 | 6.896 |
0.45 | 900.1064 | 9.496 | 900.0470 | 9.482 | 949.8844 | 6.872 | 949.7975 | 6.949 |
0.90 | 900.0331 | 9.630 | 899.8694 | 9.630 | 949.9895 | 6.903 | 950.0171 | 6.914 |
MCR = multivariate confidence region; QCR = quantile confidence region; SE = standard error.
Coverage rates for 90% MCR and QCR
90% MCR | 90% QCR | |||
---|---|---|---|---|
Coverage | SE | Coverage | SE | |
−0.9 | 900.1214 | 9.348 | 900.0037 | 9.315 |
0.0 | 899.9087 | 9.488 | 900.6709 | 9.417 |
0.9 | 900.0452 | 9.630 | 900.1320 | 9.634 |
MCR = multivariate confidence region; QCR = quantile confidence region; SE = standard error.
Coverage rates for MCR and QCR
90% MCR | 90% QCR | 95% MCR | 95% QCR | |||||
---|---|---|---|---|---|---|---|---|
Coverage | SE | Coverage | SE | Coverage | SE | Coverage | SE | |
−0.90 | 938.1345 | 7.416 | 902.0384 | 9.613 | 973.7676 | 5.078 | 951.6531 | 6.712 |
−0.45 | 928.7346 | 8.289 | 901.1082 | 9.316 | 969.1177 | 5.435 | 951.3128 | 6.652 |
0.00 | 919.4502 | 8.551 | 900.7678 | 9.421 | 964.4892 | 5.734 | 950.7263 | 6.422 |
0.45 | 913.2595 | 8.469 | 900.1465 | 9.472 | 960.8215 | 6.129 | 950.2296 | 6.794 |
0.90 | 908.4519 | 9.303 | 900.0415 | 9.197 | 958.2762 | 6.292 | 949.8492 | 6.767 |
MCR = multivariate confidence region; QCR = quantile confidence region; SE = standard error.
Coverage rates for MCR and QCR
90% MCR | 90% QCR | 95% MCR | 95% QCR | |||||
---|---|---|---|---|---|---|---|---|
Coverage | SE | Coverage | SE | Coverage | SE | Coverage | SE | |
−0.90 | 846.9411 | 10.741 | 902.1314 | 9.221 | 925.0911 | 8.047 | 954.5249 | 6.253 |
−0.45 | 828.8471 | 11.379 | 904.7747 | 9.378 | 905.0871 | 8.631 | 954.7135 | 6.124 |
0.00 | 835.3633 | 11.153 | 904.6439 | 9.311 | 903.8128 | 9.203 | 954.4201 | 6.143 |
0.45 | 841.2636 | 11.214 | 902.4423 | 9.218 | 905.3311 | 9.192 | 953.1651 | 6.754 |
0.90 | 846.1028 | 11.322 | 900.1011 | 9.139 | 907.4139 | 9.205 | 950.1371 | 6.398 |
MCR = multivariate confidence region; QCR = quantile confidence region; SE = standard error.