TEXT SIZE

search for



CrossRef (0)
Quantile confidence region using highest density
Communications for Statistical Applications and Methods 2019;26:35-46
Published online January 31, 2019
© 2019 Korean Statistical Society.

Chong Sun Hong1,a, and Myung Soo Yooa

aDepartment of Statistics, Sungkyunkwan University, Korea
Correspondence to: 1Department of Statistics, Sungkyunkwan University, 25-2 Sungkyunkwan-ro, Jongno-gu, Seoul 03063, Korea. E-mail: cshong@skku.edu
Received September 3, 2018; Revised November 15, 2018; Accepted December 11, 2018.
 Abstract

Multivariate Confidence Region (MCR) cannot be used to obtain the confidence region of the mean vector of multivariate data when the normality assumption is not satisfied; however, the Quantile Confidence Region (QCR) could be used with a Multivariate Quantile Vector in these cases. The coverage rate of the QCR is better than MCR; however, it has a disadvantage because the QCR has a wide shape when the probability density function follows a bimodal form. In this study, we propose a Quantile Confidence Region using the Highest density (QCRHD) method with the Highest Density Region (HDR). The coverage rate of QCRHD was superior to MCR, but is found to be similar to QCR. The QCRHD is constructed as one region similar to QCR when the distance of the mean vector is close. When the distance of the mean vector is far, the QCR has one wide region, but the QCRHD has two smaller regions. Based on these features, it is found that the QCRHD can overcome the disadvantages of the QCR, which may have a wide shape.

Keywords : bimodal, confidence region, coverage rate, performance, quantile vector
1. Introduction

Under the assumption of a multivariate normal distribution, the multivariate confidence region (MCR) of the mean vector is expressed in the form of an ellipse and sphere in two and three dimensions, respectively (Chew, 1966; Fan and Zhang, 2000; Frank, 1996; Johnson and Whichern, 2002; Sun and Loader, 1994). However, constructing a confidence region using the MCR of the population mean vector is not easy when a large amount of real data does not satisfy the normality assumption (Asgharzadeh and Abdi, 2011). For these non-normal real data, Hong and Kim (2017) proposed the quantile confidence region (QCR) using the multivariate quantile vector suggested by Hong et al. (2016). The QCR can be obtained in the sample data as well as data that does not satisfy the normality assumption. It was found that the QCR has a similar coverage rate to MCR under the normality assumption, and, in particular, the coverage rate of the QCR is better than MCR in the case that the normality assumption is not satisfied. However, when the probability density function has a bimodal form, it is a disadvantage that the QCR has a wide shape (Hong and Kim, 2017). Hyndman (1996) proposed the highest density region (HDR) method as an alternative confidence interval method to summarize a probability distribution by region, where it provides a relatively small region, and the HDR is well known as the highest posterior density region in Bayesian statistics (Turkkan and Pham-Gia, 1997). Tian et al. (2011) also proposed an alternative confidence region, called the highest confidence density region, using the confidence distribution which is useful to estimate constrained parameters.

This study proposed a method of QCR using HDR called a quantile confidence region using highest density (QCRHD) to overcome the disadvantage of QCR. The QCRHD is a method to construct QCR using information obtained from the probability density function. We discuss the characteristics and advantages of the QCRHD by comparing the shapes and coverage rates of the QCR and MCR with various probability density functions. In particular, the shapes and coverage rates of the QCRHD are compared with both QCR and MCR for asymmetric probability density functions; in addition, we also include bimodal probability density functions.

Section 2 introduces a method to obtain the QCRHD, which is also an alternative QCR using HDR. In Section 3, the QCRHDs for various probability density functions that do not follow the normal distribution function are obtained and compared with QCR and MCR. In Section 4, the performance and coverage rate of QCRHD is evaluated and compared with the QCR and MCR. Finally, Section 5 discusses future studies and derives some conclusions.

2. Quantile confidence region using highest density

We introduce an alternative QCR method using the HDR, which is called the QCRHD. Before explaining the multivariate QCRHD generally, we discuss only the bivariate QCRHD that can be visually explained. First, in order to use the HDR method, the (1 – α)100% HDR, which is denoted as R(kα), for the bivariate probability density function is (Hyndman, 1996):

R(kα)={(z1,z2):f(z1,z2)kα},

where kα is the largest constant satisfying P((Z1, Z2) ∈ R(kα)) ≥ 1 – α and f (z1, z2) is the bivariate probability density function of random variable (Z1, Z2). Note that R(kα) is the subset of the sample space of (Z1, Z2).

Whereas the QCR of Hong and Kim (2017) is derived using upper α/2 and 1–α/2 quantile vectors, the QCRHD is derived by obtaining the quantile vectors based on the higher part of the probability density function.

For a given α ∈ (0, 1), the (1 – α)100% QCRHD is obtained as:

  • For an initial α*(≠ α), select a constant kα* and a set of the vector R(kα* ) = {(z1α* , z2α* )} in (2.1). A constant kα* and the set of vector R(kα* ) are obtained as in the left plot in Figure 1.

  • With the set of the vector R(kα* ) = {(z1α* , z2α* )} obtained in Step 1, Find qL = min F(z1, z2) and qU = max F(z1, z2), for any (z1, z2) ∈ R(kα* ), where F(z1, z2) denotes cumulative distribution function of (Z1, Z2). The qL and qU could be identified in the same way as the second one in Figure 1.

  • For a given (qL, qU) obtained in Step 2, determine the quantile vectors zqL = {(z1qL, z2qL)}, zqU ={(z1qU, z2qU)} that satisfy the probabilities corresponding to the upper regions of zqL and zqU are αL* and αU*. respectively, that is, (z1,z2RqL2)dF(z1,z2)=αL* 듼 듼 듼and 듼 듼 듼(z1,z2RqU2)dF(z1,z2)=αU*,

    where RqL2 and RqU2 are the upper regions of zqL and zqU, respectively.

  • Repeat Step 1 through 3 until αL*=αL,αU*=αU in order to obtain kα* and R(kα* ) in Step 1. Note that αL and αU may not be equal to 1 – α/2 and α/2, respectively. But αLαU should be 1 – α. Then the probability between two quantile vectors for αL and αU, zαL, and zαU , must be 1 – α, that is, P(X ∈ (zαL, zαU)) = αU − αL = 1 − α. Two quantile vectors for αL and αU may be obtained in the same way as the third plot in Figure 1.

  • Finally, (1–α)100% QCRHD can be constructed with truncated vectors whose end point vectors in zαL and zαU are explained in Hong and Kim (2017). This QCRHD shape looks correct in Figure 1.

If there exist two sets of the vector R(kα* ) in Step 1, then qL and qU in Step 2 and the quantile vectors, zαL and zαU, in Step 4 must obtain two sets corresponding to each of two sets of R(kα* ), so that the probability between two sets of two quantile vectors must be equal to 1 – α. The steps for this case can be seen in Figure 2.

3. Shape comparison of QCRHD, QCR, and MCR

The shapes of the 95% QCRHD, QCR, and MCR are obtained and visualized, and their characteristics are discussed under the following mixture distributions:

λφ(x_;-μ_,Σ1)+(1-λ)φ(x_;μ_,Σ2),

where λ = 0.3, 0.5, 0.7, μ_={(11)or(22)},Σ1=(σ12ρσ12ρσ12σ12),Σ2=(σ22ρσ22ρσ22σ22), and φ (·) is the normal density function.

3.1. Case for μ = (1 1)T, σ12=σ22=1

In the case of μ = (1 1)T , σ12=σ22=1 in (3.1), the HDR has a shape with one region, since the distance between the mean vector of the mixture distribution is short. Therefore, the QCRHD appears as one region in Figure 3. In particular, the shapes and characteristics of the QCRHD can be understood to be similar to the QCR. Figure 3 shows the shapes of the 95% QCRHD, QCR, and MCR with respect to λ and ρ. The green line, red line, and blue line are represented by λ = 0.3, 0.5, and 0.7, respectively.

The common features of the QCRHD, QCR, and MCR can be seen in Figure 3. First, the QCRHD, QCR, and MCR narrow as the ρ changes from −0.5 to 0.5. In addition, the distance between the upper and lower bounds of both the QCRHD and QCR increases and the length for lower bound becomes shorter. With respect to the changes of λ, the three regions are all positioned toward −μ. The shape of the MCR is elliptical with the region of the MCR slightly wider than the QCRHD and QCR.

3.2. Case for μ = (2 2)T, σ12=σ22=1

Next, a bimodal probability density function where the distance between the mean vector is sufficiently long, i.e., μ = (2 2)T,,σ12=σ22=1 in (3.1) is considered. In this case, the HDR has two regions of the QCRHD, as shown in Figure 2. The shapes of the 95% QCRHD, QCR, and MCR with respect to λ and ρ are represented in Figure 4. The green line, red line, and blue line are represented by λ = 0.3, 0.5, and 0.7, respectively.

The common characteristics on the shapes of the QCRHD, QCR, and MCR with respect to λ and ρ in Figure 4 are similar to those in Figure 3. The shapes of the each regions narrow as ρ increases from −0.5 to 0.5. The length in the lower bound as well as the distance between the lower and upper bounds of the QCRHD and QCR have an analogous behavior in Section 3.1. Finally, as λ increases from 0.3 to 0.7, the centers of QCRHD, QCR, and MCR all shift toward −μ.

It is remarkable in Figures 3 and 4 that the shapes of the QCRHD are different, whereas those of the QCR are similar. First, the QCRHD in Figure 4 consists of two regions. Note that the QCRHD in Figure 3 only has one region. Among the two regions of the QCRHD, the region located at the lower left has a similar shape to the QCR. However, the upper right region has a more angled lower bound than the QCR. These characteristics stand out when ρ is negative (for example, −0.5). However, the upper right region gradually has similar shapes with the QCR as ρ increases to 0.5; therefore, the shape of the region at the upper right and the region at the lower left become similar.

Another comparison of shapes between the QCRHD, QCR, and MCR can be explored in Figure 6 and 8. In both cases, the lower bound of the lower left QCRHD almost overlap with the QCR. It also can be found that the QCR is always composed of one region, even when the probability density function has a bimodal form. While the QCRHD and QCR have similar regions in Figure 6 where the mean vectors are close, the QCRHD has a smaller region than the QCR in Figure 8 where mean vectors are distant. When ρ is 0.5, the difference of region between the QCRHD and QCR is not significant; however, it is substantial when ρ is −0.5. We may conclude that the QCRHD overcomes the drawbacks of QCR, which is one of the purposes of this study.

In the case of MCR, it is in the form of an ellipse regardless of λ or ρ. Since the MCR is composed of a single region, the MCR is found to have a considerably wide region when the distance between the mean vector is large.

As mentioned at the end of Section 2, when there exist two sets of the vector R(kα), the QCRHD has two regions. This case is not considered to find the QCRHD in this work if the two regions of the QCRHD overlapp. Section 5 explains the reason for this.

3.3. Case for μ = (2 2)T, σ12=4,σ22=1

In Section 3.3, the normal mixture distribution with different variance-covariance matrix, μ = (2 2)T, σ12=4,σ22=1 with ρ = −0.5 in (3.1), is considered. Figure 5 represents the shapes of QCRHD, QCR, and MCR according to the change of λ. The green line, red line, and blue line are represented by cases of λ = 0.3, 0.5, 0.7, respectively. The QCRHD has two regions, whereas both the QCR and MCR consist of a single region. It could be found that the shapes of QCRHD, QCR, and MCR in Figure 5 are similar to Figure 4 in Section 3.2. Therefore, as described in Section 3.2, all three regions in this case are found to be positioned toward −μ as λ increases. The lower bound of the QCRHD in Figure 5 is shown to be longer than Figure 4. In addition, the QCRHD in Figure 5 has a shorter distance between the two regions in Figure 4. That is a characteristic of the QCRHD in Figure 5, since σ12 in ∑1 is larger than σ22 in ∑2. Therefore, the shape of the QCRHD depends on the distribution.

It could be observed that the QCR also has similar characteristic with the QCRHD, and its lower bound in Figure 5 has a longer length than in Figure 4. The MCR still has the form of a single ellipse, but the MCR in Figure 4 has a slimmer shape than Figure 5.

Figure 5 provides only the case of ρ = −0.5. The characteristics of the QCRHD could be easily extended and explained to cases of ρ = 0 and 0.5 because the behavior of the QCRHD in Section 3.3 are similar to those in Section 3.2.

4. Comparison of coverage rates for QCRHD, QCR, and MCR

We have found that the QCRHD can overcome the disadvantage of the QCR which has wide region. In this section, we compare the performance of the QCRHD with the QCR and MCR. After obtaining the 95% QCRHD, QCR, and MCR, their coverage rates are calculated and compared. In order to calculate coverage rates, 100 sample means are generated from (3.1) with the variance-covariance matrix times 1/n. Then, the corresponding coverage rates are calculated as the ratio of sample means included in each region with 1,000 iterations.

4.1. Case for μ = (1 1)T, σ12=σ22=1

When μ = (1 1)T, σ12=σ22=1 in (3.1) with the variance-covariance matrix times 1/n, Figure 6 demonstrates how each regions cover 100 sample means. The blue, red, and green lines indicate QCRHD, QCR, and MCR, respectively. From Figure 6, it can be seen that the region of the MCR is slightly larger than the QCRHD and QCR. Table 1 and Figure 7 summarize the coverage rate of the 95% QCRHD, QCR, and MCR with respect to λ and ρ, where the red line, green line and blue line represent the QCRHD, QCR, and MCR respectively. It can be seen that coverage rates of 95% QCRHD and QCR are close to 0.95, and the standard deviation is also similar. However, the coverage rate of the MCR is bigger than 0.95, so the performance of the QCRHD is considered similar to QCR and better than MCR.

4.2. Case for μ = (2 2)T, σ12=σ22=1

In case of μ = (2 2)T, σ12=σ22=1 in (3.1), it can also be explored that the QCRHD is constructed with two small regions whereas the QCR has one big region, as discussed in Section 3.2. Figure 8 illustrates how each region covers 100 sample means. The blue, red, and green lines indicate the QCRHD, QCR, and MCR, respectively. It is also found that the sample means are unlikely to locate between two regions of the QCRHD. The coverage rates of the 95% QCRHD, QCR, and MCR are summarized in Table 2 and represented in Figure 9. Similar to the result of Section 4.1, both the 95% QCRHD and QCR show coverage rates close to 0.95, but the MCR has a value larger than 0.95.

4.3. Case for μ = (2 2)T, σ12=4,σ22=1

Whereas the coverage rates of the QCRHD, QCR, and MCR under the normal mixture distribution are considered with equal variance-covariance matrix in Sections 4.1 and 4.2, one compares their coverage rates under (3.1) with a different variance-covariance matrix in this section, i.e., σ12=4,σ22=1 with

μ = (2 2)T and ρ = −0.5. We compare the coverage rates of the QCRHD, QCR, and MCR according to the change of λ.

The blue, red, green lines in Figure 10 are represented by shapes of the QCRHD, QCR, and MCR, respectively according to the change of λ. While the QCRHD has smaller region than the QCR in Figure 8, the QCRHD in Figure 10 has similar region as the QCR when λ = 0.3, 0.5, but smaller region when λ = 0.7. Figure 10 shows that the MCR has smaller region than the QCRHD and QCR.

The coverage rates of the 95% QCRHD, QCR, and MCR are provided in Table 3. Coverage rates of the QCRHD and QCR are close to 0.95, whereas the MCR has smaller value than 0.95. Therefore, one might conclude that the QCRHD has better performance than the MCR in this Section.

Table 3 provides only the case of ρ = −0.5. Therefore, we can expect similar results when ρ = 0 and 0.5 are based on the finding of the QCRHD in Section 3.3.

5. Conclusion and further study

The well-known MCR cannot be used when the multivariate data does not satisfy the normality assumption. Hong and Kim (2017) proposed a QCR using the multivariate quantile vector and showed that the coverage rate of the QCR is better than MCR when the assumption of normality is not satisfied.

However, the QCR has a disadvantage of a wide region when the probability density function has the bimodal form. In order to overcome the disadvantage of the QCR, an alternative QCR, called QCRHD, is proposed using the HDR of Hyndman (1996) that can utilize information obtained from the probability density function.

The QCRHD has either one big region or two small regions whose shapes are dependent on the type of distribution. The QCRHD has one single region for mixture distributions with a short distance between mean vectors or heavy tails of the variance-covariance matrix. However, the QCRHD is represented by two small regions when the distance between the mean vector of mixture distribution is large or the variance-covariance matrix has a light tail.

The results of this study show that the QCRHD has better coverage rates than MCR with various probability density functions and shows a similar performance with the QCR. In addition, the QCRHD has an advantage that it has smaller region than the QCR. Therefore, one could conclude that QCRHD are better than QCR in terms of coverage rates and shapes.

We consider only the cases of one region of the QCRHD and two non-overlapped regions of the QCRHD. When there exist two regions of the QCRHD and their two regions overlap, it is found that the regions of the QCRHD become too narrow, so that the coverage rate of that case becomes worse than those of other cases. Consequently, we consider two cases in this work. One is that there exists only one set of the vector R(kα) as well as one region of the QCRHD. The other is when there exist two sets of the vector R(kα) and these corresponding regions of the QCRHD do not overlap. Hence, research on the overlapped regions of QCRHD will be left to a future study.

The discussion on the QCRHD is restricted to the bivariate probability density function in this study. However, the QCRHD study might be extended to multivariate probability density functions. Even though the multivariate QCRHD cannot be visualized, it is left to a future study to discuss the characteristics of the multivariate QCRHD. In addition, research on cases greater than a trimodal will be an interesting problem.

Figures
Fig. 1. Quantile confidence region using highest density with one set of the vector R(kα* ).
Fig. 2. Quantile confidence region using highest density with two sets of the vector R(kα* ).
Fig. 3. Shapes of QCRHD, QCR, and MCR when μ = (1 1)T, . QCRHD = QCR using highest density; QCR = quantile confidence region; MCR = multivariate confidence region.
Fig. 4. Shapes of QCRHD, QCR, and MCR when μ = (2 2)T, . QCRHD = QCR using highest density; QCR = quantile confidence region; MCR = multivariate confidence region.
Fig. 5. Shapes of QCRHD, QCR, and MCR when μ = (2 2)T, . QCRHD = QCR using highest density; QCR = quantile confidence region; MCR = multivariate confidence region.
Fig. 6. Shapes of QCRHD, QCR, and MCR when μ = (1 1)T, . QCRHD = QCR using highest density; QCR = quantile confidence region; MCR = multivariate confidence region.
Fig. 7. Comparison of coverage rates between QCRHD, QCR, and MCR when μ = (1 1)T, . QCRHD = QCR using highest density; QCR = quantile confidence region; MCR = multivariate confidence region.
Fig. 8. Shapes of QCRHD, QCR, and MCR when μ = (2 2)T, . QCRHD = QCR using highest density; QCR = quantile confidence region; MCR = multivariate confidence region.
Fig. 9. Comparison of coverage rates between QCRHD, QCR, and MCR when μ = (2 2)T, . QCRHD = QCR using highest density; QCR = quantile confidence region; MCR = multivariate confidence region.
Fig. 10. Comparison of coverage rates between QCRHD, QCR, and MCR when μ = (2 2)T, . QCRHD = QCR using highest density; QCR = quantile confidence region; MCR = multivariate confidence region.
Fig. 11. Comparison of coverage rates between QCRHD, QCR, and MCR when μ = (2 2)T, . QCRHD = QCR using highest density; QCR = quantile confidence region; MCR = multivariate confidence region.
TABLES

Table 1

Comparison of coverage rates between QCRHD, QCR, and MCR when μ = (1 1)T, σ12=σ22=1

λ ρ 95% QCRHD 95% QCR 95% MCR



Coverage SE Coverage SE Coverage SE
0.3 −0.5 94.57 2.22 95.25 2.06 96.87 1.71
0.0 94.78 2.18 95.05 2.18 96.61 1.79
0.5 94.78 2.14 95.02 2.24 96.38 1.84

0.5 −0.5 95.10 2.09 94.92 2.12 97.35 1.60
0.0 94.50 2.25 95.10 2.18 97.18 1.69
0.5 95.07 2.10 95.10 2.12 97.02 1.74

0.7 −0.5 94.95 2.24 95.13 2.13 96.92 1.70
0.0 94.91 2.21 94.96 2.14 96.56 1.83
0.5 94.86 2.06 95.05 2.19 96.33 1.78

QCRHD = QCR using highest density; QCR = quantile confidence region; MCR = multivariate confidence region.


Table 2

Comparison of coverage rates between QCRHD, QCR, and MCR when μ = (2 2)T, σ12=σ22=1

λ ρ 95% QCRHD 95% QCR 95% MCR



Coverage SE Coverage SE Coverage SE
0.3 −0.5 95.09 2.27 95.15 2.13 96.91 1.70
0.0 95.19 2.14 94.98 2.21 96.55 1.79
0.5 95.07 2.24 95.11 2.10 96.89 1.74

0.5 −0.5 94.93 2.14 95.21 2.11 97.29 1.58
0.0 95.18 2.13 95.00 2.13 97.24 1.61
0.5 94.98 2.20 94.97 2.19 97.36 1.57

0.7 −0.5 95.07 2.11 95.10 2.14 96.79 1.73
0.0 94.92 2.15 95.11 2.07 96.76 1.75
0.5 94.91 2.26 95.01 2.19 96.79 1.75

QCRHD = QCR using highest density; QCR = quantile confidence region; MCR = multivariate confidence region.


Table 3

Comparison of coverage rates between QCRHD, QCR, and MCR when μ = (2 2)T, σ12=4,σ22=1

ρ λ 95% QCRHD 95% QCR 95% MCR



Coverage SE Coverage SE Coverage SE
−0.5 0.3 94.90 2.05 95.05 2.11 90.49 2.52
0.5 95.02 2.05 95.12 2.31 92.46 2.61
0.7 95.11 2.15 94.95 2.17 90.95 2.80

QCRHD = QCR using highest density; QCR = quantile confidence region; MCR = multivariate confidence region.


References
  1. Asgharzadeh, A, and Abdi, M (2011). Confidence intervals and joint confidence regions for the two-parameter exponential distribution based on records. Communications for Statistical Applications and Methods. 18, 103-110.
    CrossRef
  2. Chew, V (1966). Confidence, prediction, and tolerance regions for the multivariate normal distribution. Journal of the American Statistical Association. 61, 605-617.
    CrossRef
  3. Fan, J, and Zhang, W (2000). Simultaneous confidence bands and hypothesis testing in varying-coefficient models. Scandinavian Journal of Statistics. 27, 715-731.
    CrossRef
  4. Frank, O (1996). Simultaneous confidence intervals. Scandinavian Actuarial Journal. 1996, 78-89.
  5. Hong, CS, and Kim, HI (2017). Multivariate confidence region using quantile vectors. Communications for Statistical Applications and Methods. 24, 641-650.
    CrossRef
  6. Hong, CS, Han, SJ, and Lee, GP (2016). Vector at risk and alternative value at risk. The Korean Journal of Applied Statistics. 29, 689-697.
    CrossRef
  7. Hyndman, RJ (1996). Computing and graphing highest density regions. The American Statistician. 50, 120-126.
  8. Johnson, RA, and Wichern, DW (2002). Applied Multivariate Statistical Analysis. Upper Saddle River: Prentice Hall
  9. Sun, J, and Loader, CR (1994). Simultaneous confidence bands for linear regression and smoothing. The Annals of Statistics. 22, 1328-1345.
    CrossRef
  10. Tian, L, Wang, R, Cai, T, and Wei, L (2011). The highest confidence density region and its usage for joint inferences about constrained parameters. Biometrics. 67, 604-610.
    Pubmed CrossRef
  11. Turkkan, N, and Pham-Gia, T (1997). Algorithm as 308: highest posterior density credible region and Minimum area. Journal of the Royal Statistical Society. Series C (Applied Statistics). 46, 131-140.
    CrossRef