search for

CrossRef (0)
Quantile estimation using near optimal unbalanced ranked set sampling
Communications for Statistical Applications and Methods 2021;28:643-653
Published online November 30, 2021
© 2021 Korean Statistical Society.

Raman Nautiyala, Neeraj Tiwaria, Girish Chandra1,b

aDepartment of Statistics, Kumaun University, India
bDivision of Forestry Statistics, Indian Council of Forestry Research and Education, India
Correspondence to: 1 Division of Forestry Statistics, Indian Council of Forestry Research and Education, PO New Forest, Dehradun 248006, India. E-mail: gchandra23@yahoo.com
Received June 4, 2021; Revised September 24, 2021; Accepted September 28, 2021.
Few studies are found in literature on estimation of population quantiles using the method of ranked set sampling (RSS). The optimal RSS strategy is to select observations with at most two fixed rank order statistics from different ranked sets. In this paper, a near optimal unbalanced RSS model for estimating pth(0 < p < 1) population quantile is proposed. Main advantage of this model is to use each rank order statistics and is distributionfree. The asymptotic relative efficiency (ARE) for balanced RSS, unbalanced optimal and proposed near-optimal methods are computed for different values of p. We also compared these AREs with respect to simple random sampling. The results show that proposed unbalanced RSS performs uniformly better than balanced RSS for all set sizes and is very close to the optimal RSS for large set sizes. For the practical utility, the near optimal unbalanced RSS is recommended for estimating the quantiles.
Keywords : asymptotic relative efficiency, Neyman’s allocation, order statistics, quantiles, ranked set sampling
1. Introduction

Ranked set sampling (RSS), introduced by McIntyre (1952), is a sampling scheme that can be utilized to potentially increase precision and reduce costs when actual measurements of the observations are costly or/and time-consuming but the ranking of the set of items can be easily done without actual measurements. Such situations normally arise in environmental monitoring and assessment that require observational data. Since the inception of the concept of RSS by McIntyre (1952) and development of its mathematical foundation by Takahasi and Wakimoto (1968), various researchers have investigated the utility of RSS and the conditions under which it may be useful and cost-effective. RSS has been satisfactorily used to estimate pasture yield by McIntyre (1952), forage yields by Halls and Dell (1966), mass herbage in a paddock by Cobby et al. (1985), shrub phytomass by Martin et al. (1980), tree volume in forest by Stokes and Sager (1988), root weight of Arabidopsis thaliana by Barnett and Moore (1997) and bone mineral density in a human population by Nahhas et al. (2002). RSS is a nonparametric procedure (Stokes and Sagar, 1988; Bohn and Wolfe, 1992; Hettmansperger, 1995) but also has been used in the parametric setting (Bhoj (1997); Lam et al. (1996); Stokes (1995)). Most of the distributions considered by these investigators belong to the family of random variables with cumulative distribution function (CDF) of the form F((xμ)/σ), where μ and σ are the location and scale parameters, respectively.

The selection of a ranked set sample of size k involves drawing k random samples with k units in each sample from a population for which an estimate of the mean is required. The units in each sample are ranked by using judgment or other inexpensive methods. The unit with lowest rank is measured from the first sample, the unit with second lowest rank is measured from the second sample, and this procedure is continued until the unit with the highest rank is measured from the last sample. The k2 ordered observations in k samples can be displayed as,


We measure only k (y(ii), i = 1, 2, . . . , k) diagonal observations, and they constitute the RSS. It is noted that these k observations are independently distributed. The above procedure is repeated m ≥ 2 times to get a sample of size n = mk. Since each order statistic is measured equal number of times, therefore, this procedure is called the balanced RSS. For convenience, we take m = 1 for the computation of asymptotic relative efficiency (ARE).

For estimating the population mean, it is known that balanced RSS is a more precise method than the simple random sampling (SRS) (Tiwari and Chandra, 2011). For the case of balanced RSS designs, it has been shown by Takahasi and Wakimoto (1968) that the relative precision (RP) of RSS with respect to SRS lies between 1 and (k + 1)/2, where k is the set size. However, the gain in the performance of the RSS can be further improved when an appropriate unequal allocation for the order statistics is made. There are some unbalanced RSS procedures for skewed distributions suggested by investigators (see ‘t’ and ‘s, t’ allocation model (Kaur et al., 1997); systematic allocation model (Tiwari and Chandra, 2011) and simple allocation model (Chandra et al., 2018; Bhoj and Chandra, 2019)). For skewed distributions, Neyman’s allocation provides the optimal allocation model with Var(Ney) < Var(bal) < Var(SRS). However, for symmetric distributions, Neyman’s optimal allocation model is marginal. The allocation models for symmetric distributions are given by Kaur et al. (2000) and Chandra and Tiwari (2015). With the use of unequal allocation, Takahasi and Wakimoto (1968) showed that the RP of RSS relative to SRS lies between 0 and k. This shows that an appropriate use of unequal allocation increases the performance of RSS beyond that achievable with balanced RSS.

In this paper, a practical unbalanced RSS design for estimating the population quantiles has been developed. In the next Section, the method of quantile estimation using RSS is discussed. Section 3 provides the estimator of quantiles based on the proposed near-optimal unbalanced RSS model and its asymptotic variance. The AREs under balanced RSS, optimal and proposed near optimal models are given in Section 4. Section 5 discussed about the imperfect ranking cases. The conclusion of the study is given in Section 6.

2. Quantile estimation using ranked set sampling

Suppose the unknown CDF and corresponding probability density function (pdf) are respectively denoted by F(x) and f (x) and a random sample of size n is drawn from ‘F’ using unbalanced RSS such that mi observations are taken for measurements for ith rank order statistic, i = 1, 2, . . . , k with Σi=1kmi=n. Let X(i:k) j, i = 1, 2, . . . , k, j = 1, 2, . . . ,mi denotes the jth observation of ith order statistic. Note that the observations corresponding to the ith rank order statistic are i.i.d. with mean μ(i:k) and variance σ(i:k)2. Letting F(i) and f(i) be the CDF and pdf of the ith rank order statistic. We consider perfect ranking in this paper, however, some description about the imperfect ranking is also discussed in the later section. Our aim is to estimate the pth quantile, ξρ,


It is known that,


The simplified form of f(i)(x) can be rewritten as,


where, b(y; a, b) = ya1(1 − y)b1/β(a, b) denotes the pdf of the random variable wich follows beta distribution with parameters a and b, where, β(a,b)=01ya-1(1-y)b-1dy=Γ(a)Γ(b)/Γ(a+b) and Γ(a) = (a − 1)! The corresponding CDF is denoted by B(y; a, b). That is,


Using these notations, we can write the CDF of ith rank order statistic, F(i)(x), as


Since F(ξp) = P(Xξp) = p, therefore,


The quantile ξp can be estimated using quantile of F(i) corresponding to the probability of B(p; i, k+1− i), which is known for any p. Let F^(i)(x)=1/(mi)Σj=1miI(X(i)jx), denotes the empirical distribution of F(i)(x) based on ith order statistic.


The problem of quantile estimation using the method of balanced RSS was first considered by Chen (2000) and found that the RSS substantially improves the efficiency of quantile estimation. This can be regarded as a generalization of sign test given by Hettmansperger (1995). The problem of quantile estimation for any distribution function using unbalanced RSS has already been discussed by Chen (2001) and Zhu and Wang (2005). Chen (2001) proved that the method of unbalanced RSS (optimal choice of unbalanced RSS) outperforms the methods of balanced RSS and SRS in terms of ARE. He used a probability vector q = (q1, q2, . . . , qk) with Σi=1kqi=1 and ≤ qi ≤ 1, in constructing a new CDF Fq(x)=Σi=1kqiF(i)(x) and corresponding fq(x)=Σi=1kqif(i)(x) which is useful to find the asymptotical unbiased estimator of ξp. The quantile estimators in balanced RSS (Chen, 2000) and unbalanced RSS (Chen, 2001) were based on the empirical distribution of the pooled data and considered the balanced cases by Stokes and Sagar (1988).

Chen (2001) used at most two rank orders for the optimal unbalanced RSS design. Since the estimator of ξp under SRS scheme has the asymptotic variance p(1 − p)/(n f2(ξp), therefore, the derived ARE of unbalanced RSS scheme with respect to SRS scheme using Chen (2001) is given by,

ARE (ξ^p)URSS:SRS=p(1-p){Σi=1kqibi}2Σi=1kqiBi(1-Bi).

where qi = 1/k,

The equation (2.1) gives the expression for ARE of balanced RSS with respect to SRS and is given by,

ARE (ξ^p)BRSS:SRS=p(1-p)kΣi=1kBi(1-Bi).

For the optimal RSS design, Chen (2001) given the values of ARE for k = 2(1)10 which has also been computed in Table 4.

Zhu and Wang (2005) suggested a new weighted estimator of ξp. They suggested that as each ξ̂i, i = 1, 2, . . . , k is a consistent estimator of ξp, therefore, a weighted estimator can be constructed by combining them and assigning some weights to each ξ̂i, where, ξ̂i(x) = inf{x : i(x) ≥ p} and F^(i)(x)=1/miΣj=1miI(X(i)jx) is the empirical CDF of F(i)(x).

The optimal strategy in the estimator of Zhu and Wang (2005) is to select observations with one fixed rank from different ranked sets. Chen (2001) and Zhu and Wang (2005) also proved that the optimal rank and the gain in relative efficiency are distribution free and depend on the set size and given probability only. This is also seen in the near optimal RSS model discussed in the next Section.

3. Near optimum probability vector for quantile estimation

In this Section, we deal with the problem of quantile estimation based on the unbalanced RSS which depends on each rank order. Motivated from the various allocation models of population mean (Tiwari and Chandra, 2011; Kaur et al., 1997; Kaur et al. 2000; Bhoj and Chandra, 2019), the following allocation models are proposed for the purpose of estimating pth quantile ξp in the range 0 < p ≤ 0.50. This model can be generalized for the range 0.50 < p < 1. The range 0 < p ≤ 0.50 is divided into three non-overlapping classes with equal width i.e. 0 < p ≤ 0.17, 0.17 < p ≤ 0.33, 0.33 < p ≤ 0.50. For each class, there is different allocation mi, i = 1, 2, . . . , k depending upon the nature of k, for finding the probability qi = mi/n. The proposed allocation models are presented in Table 1. In this paper we are using the sample size n = k(k + 1)/2 as taken in Tiwari and Chandra (2011) for the purpose of calculations. Using Table 1, the number of allocations corresponding to each rank order for p = 0.01, 0.05, 0.1, 0.25 and 0.5 for set size k = 2(1)8 is shown in Table 2. The basic drawback in the optimal unbalanced RSS strategy of Chen (2001) and Zhu and Wang (2005) is that mostly the probability of 1 is given to one rank orders and in few cases probability 0.5 each to two rank orders so that their asymptotic variance is minimum. This results in use of at most two rank orders in estimating the population quantiles and ignoring the others. This results in the respective estimators losing the property of sufficiency. In our model, we propose to use each rank order at least once. By doing so, the proposed estimator contains the information about all order statistics. With this in mind, the estimator of ξp is proposed to be a linear combination of ξ̂i, i = 1, 2, . . . , k as,


where qi = mi/n, i = 1, 2, . . . , k, for each k. For k = 2(1)8, the values of mi are given in Table 2. Since this model performs close to optimal model of Chen (2001), we named it as near optimal model and used ‘N’ in the subscript of the estimator. In RSS, it is to be known that the cost of ranking of the units is negligible. For the fixed sample size, the actual measurement cost under the optimal and near optimal allocation models will be almost same since each order statistic taken for measurement (even the same order statistic) has to be measured.

In our case, the probability vector q = (q1, q2, . . . , qk) has non-zero components. The following theorem (Rohatgi and Saleh, 2000) for finding the asymptotic variance of ξ̂N,AVAR(ξ̂N), is useful.

Theorem 1

Let X1, X2, . . . , Xn are n independent random variables with a common mean μ and with variances σ12,σ22,,σn2. The linear combination, A1X1+A2X2+· · ·+AnXn, with A1+A2+· · ·+An = 1, that has the smallest variance and is obtained by taking Ai inversely proportional to σi2. The resulting minimum variance is,


Let us denote Bi and bi as Bi = B(p; i, k + 1 − i) and bi = b(p; i, k + 1 − i). Since, the asymptotic variance of ξ̂i (Chen, 2001) is given by,


with Σi=1kqi=1. Therefore, using (3.2), AVAR( ξ̂N) is given by,


where n=Σi=1kmi, the sample size. We now find the AVAR( ξ̂N) as,


AVAR( ξ̂N) is different for different allocation procedures. In the next Section, we compute AREs for three RSS procedures.

4. Computations of AREs

In this Section, the performance of the estimators of population quantiles based on three procedure of balanced, optimum unbalanced RSS (estimator ξ̂c) by Chen (2001) and the near optimal unbalanced RSS (estimator ξ̂N) are compared. In the first case we compared these models using equation (3.3) for different probability values p = 0.01, 0.05, 0.10, 0.25 and 0.50 and are given in Table 3. For determining the values of Bi and bi, , the online software https://keisan.casio.com/exec/system/1180573225, after duly verified, is used.

We now calculated the AREs of three different methods with respect to SRS using equation (2.1) for the same values of p and k as used in Table 3. These are presented in Table 4. These AREs helps to understand more, since the values of ARE of the estimator ξ̂C are same as calculated by Chen (2001). As known, the performance of balanced RSS is better with respect to SRS for each p and k, however it is marginal (Table 4).

From Tables 3 and 4, it is seen that ARE of proposed method is substantially more than that of balanced RSS for each p and k. All three AREs are same for k = 2 and p = 0.05. The difference among these increases with the decrease of p. We note that each of the ARE increases with k. Interestingly, the rate of increase of ARE (over the set size k) of the proposed method is higher than the optimum method for all p except p = 0.5 (Table 5). For p = 0.5, this rate is close (for few k it is higher and for few k it is lower) to the optimum method. This implies that as the set size increases the ARE of the proposed method tends to closer to the optimum method. This is one of the important advantages of the proposed method over optimal method.

Now we wish to see the performance of the proposed methods with the increasing values of sample sizes. For this purpose, we take k = 4 and three values of p, one from each category of Table 1. These three chosen values of p are p = 0.01, 0.25 and 0.50. The performance of ARE of proposed method with respect to SRS for increasing sample size is shown in Figure 1.

From Figure 1, we see that as sample size increases, the performance of proposed estimator is increases for each p and after attaining a peak, its increase remains almost the same. We see that at this peak, the value of ARE is very close to the optimum ARE of Chen (2001). This also helps us to choose the suitable sample size for each value of p so that the performance of the proposed method could be very near to the optimum method and the other discussed advantage of our proposed method remains the same.

5. Case of imperfect ranking

Since the RSS procedures perform better in the absence of ranking errors of units, however, this situation is rarely seen in practical situations particularly for the large values of k. Minimal uncertainty in the rankings may not cause an excessive increase in the variance, but if the ranking process is not very reliable, the precision of RSS estimators (particularly unbalanced ones) may be reduced. The Mann-Whitney-Wilcoxon procedures under imperfect ranking have been discussed by Bohn and Wolf (1992). They discussed proposing a model for the probabilities of imperfect judgment rankings based on the concept of expected spacing and have used this model to study the properties of tests based on the ranked set analogue of the Mann-Whitney-Wilcoxon statistic. Greater precision in the RSS estimator requires more accurate rankings in each set (David and Levine, 1972; Barnett and Moore, 1997; Stark and Wolfe, 2002). Al-Omari and Bouza (2014) discussed the impact of perfect and imperfect ranking in details. Chen (2001) showed that their model is not optimal when there is a presence of ranking errors. It is difficult to find the exact loss of efficiency due to imperfect ranking, but it is obvious that the probability vector q = (q1, q2, . . . , qk)/ gets affected by the presence of ranking error. Since the estimators are entirely dependent on this vector, there will be a loss in the AREs. This loss is more for large set sizes. Therefore, it might be expected that the estimator based on optimal RSS is more affected than near optimal and least affected to the balanced RSS methods. The model due to Dell and Clutter (1972) is given by,


where, y[ii] and y(ii) denotes the “judgment order statistic” and the “true order statistic” respectively, and ɛii~N(0,σe2). Dell and Clutter (1972) have shown that in judgment are likely to be influenced by the set size k.

6. Conclusion

In the first study of quantile estimation by unbalanced RSS (Chen, 2001), the optimal q for estimating population quantiles have at most two non-zero components. While in the weighted estimator given by Zhu and Wang (2005) only one rank order is used for finding the optimal unbalanced design. In this paper, we have focused on quantile estimation and proposed a new near optimal allocation model for estimating the pth (0 < p < 1) population quantile. The probability vector q consists of each rank orders in which each component becomes the ratio of number of allocations corresponding to that component to the total number of observations. For balanced RSS, each component of q is taken as 1/k for the set size k, however, the same of unbalanced RSS is mi/n (mi being the number of allocations corresponding to ith order statistic).

The proposed method is shown to be more efficient asymptotically with balanced RSS method and close to optimal allocation design for large sample sizes. As seen in Figure 1, for each set size and desired p, the sample size may be chosen so that the performance of the proposed method is close to the optimum allocation model of Chen (2001). There is a major disadvantage of the RSS scheme for quantile estimation in that the performance decreases with the increase of ranking errors. This case might arise for any of the estimators and depends upon the surveyor with their expertise and tools used for the ranking purpose. It is also seen that the ARE of the proposed method tends to that of the optimum method with the increase of set size k. Our near optimal allocation model has a practical importance, and it provides an applied allocation sequence for the purpose of quantile estimation.

Fig. 1. Performance of ARE of proposed unbalanced RSS with SRS over increasing sample size.

Table 1

Proposed near optimal allocation model for the quantile estimation

Class of pCondition for kAllocation model
0 < p ≤ 0.17all kmi = 1, i = 2, 3, . . . , k
0.17 < p ≤ 0.33k = 2, 3mi, i = 2, 3, . . . , k;
k = 3(1)7mi = 1, for all i ≠ 2;
k > 7mi = 1, for all i ≠ 3
0.33 < p ≤ 0.50odd kmi = 1, for all ik+12;
even k and k = 2m1 = 1, m2 = nm1
even k and k > 2mi = 1, for all ik2 and k+22;

Table 2

Proposed allocation model for the quantile estimation at k = 2(1)8 and different p

pkRank order (i)n




Table 3

Comparison of three AREs (using Equation (3.3)) at different probability values for k = 2(1)8

MethodSet size k

p = 0.01Balanced102.50103.95105.43106.86108.27109.67111.05


p = 0.05Balanced22.5223.8925.2026.4427.6328.7729.87


p = 0.10Balanced12.5513.8315.0216.1217.1618.1419.08


p = 0.25Balanced6.707.858.879.7910.6311.4112.15


p = 0.50Balanced5.336.437.378.218.969.6610.32


Table 4

Comparison of three AREs with respect to SRS (using equation 2.1) at different probability values for k = 2(1)8

MethodSet size k

p = 0.01Balanced1.01001.02001.03001.04001.05001.05991.0699


p = 0.05Balanced1.04991.09951.14881.19761.24591.29371.3408


p = 0.10Balanced1.09891.19591.29041.38211.47081.55651.6392


p = 0.25Balanced1.23081.43821.62481.79421.95002.09472.2303


p = 0.50Balanced1.33331.60001.82862.03172.21652.38692.5461


Table 5

Increasing rate of ARE over k for different methods

MethodSet size k

p = 0.01Balanced1.00991.00981.00971.00961.00941.0094


p = 0.05Balanced1.04721.04481.04251.04031.03841.0364


p = 0.10Balanced1.08831.07901.07111.06421.05831.0531


p = 0.25Balanced1.16851.12971.10431.08681.07421.0647


p = 0.50Balanced1.20001.14291.11111.09101.07691.0667


  1. Al-Omari AI and Bouza CN (2014). Review of ranked set sampling: modifications and applications. Revista Investigaci?n Operacional, 35, 215-240.
  2. Barnett V and Moore K (1997). Best linear unbiased estimates in ranked-set sampling with particular reference to imperfect ordering. Journal of Applied Statistics, 24, 697-710.
  3. Bhoj DS (1997). New parametric ranked set sampling. Journal of Applied Statistical Sciences, 6, 275-289.
  4. Bhoj DS and Chandra G (2019). Simple unequal allocation procedure for ranked set sampling with skew distributions. Journal of Modern Applied Statistical Methods, 18, eP2811.
  5. Bohn LL and Wolfe DA (1992). Nonparametric two-sample procedures for ranked set samples data. Journal of the American Statistical Association, 87, 552-561.
  6. Bohn LL and Wolfe DA (1994). The effect of imperfect judgment rankings on properties of procedures based on the ranked-set samples analog of the Mann Whitney-Wilcoxon statistic. Journal of the American Statistical Association, 89, 168-176.
  7. Chandra G, Bhoj DS, and Pandey R (2018). Simple unbalanced ranked set sampling for mean estimation of response variable of developmental programs. Journal of Modern Applied Statistical Methods, 17.
  8. Chandra G, Tiwari N, and Nautiyal R (2015). Near optimal allocation models for symmetric distributions in ranked set sampling. Statistics in Forestry: Methods and Applications, 85-90.
  9. Chen Z (2000). On ranked set sample quantiles and their applications. Journal of Statistical Planning and Inference, 83, 125-135.
  10. Chen Z (2001). The optimal ranked set sampling scheme for inference on population quantiles. Statistica Sinica, 11, 23-37.
  11. Cobby JM, Ridout MS, Bassett PJ, and Large RV (1985). An investigation into the use of ranked set sampling on grass and grass-clover swards. Grass and Forage Science, 40, 257-263.
  12. David HA and Levine DN (1972). Ranked set sampling in the presence of judgment error. Biometrics, 28, 553-555.
  13. Dell TR and Clutter JL (1972). Ranked set sampling theory with order statistics background. Biometrics, 28, 545-555.
  14. Halls LK and Dell TR (1966). Trials of ranked set sampling for forage yields. Forest Science, 12, 22-26.
  15. Hettmansperger TP (1995). The ranked-set sampling sign test. Nonparametric Statistics, 4, 263-70.
  16. Kaur A, Patil GP, and Taillie C (1997). Unequal allocation models for ranked set sampling with skew distributions. Biometrics, 53, 123-130.
  17. Kaur A, Patil GP, and Taillie C (2000). Optimal allocation for symmetric distributions in ranked sampling. Annals of the Institute of Statistical Mathematics, 52, 239-254.
  18. Lam K, Sinha BK, and Wu Z (1996). Estimation of location and scale parameters of a logistic distribution using a ranked set sample. Statistical Theory and Applications, New York, Springer.
  19. Latpate R, Kshirsagar J, Gupta VK, and Chandra G (2021). Advanced Sampling Methods, Springer.
  20. Martin WL, Sharik TL, Oderwald RG, and Smith DW (1980). Evaluation of ranked set sampling for estimating shrub phytomass in Appalachian oak forests, (pp. 4-80), Blacksburg, Virginia, School of Forestry and Wildlife Resources, Virginia Polytechnic Institute and State University FWS.
  21. McIntyre GA (1952). A method for unbiased selective sampling using ranked sets. Australian Journal of Agricultural Research, 3, 385-390.
  22. Nahhas RW, Wolfe DA, and Chen H (2002). Ranked set sampling: cost and optimal set size. Biometrics, 58, 964-971.
    Pubmed CrossRef
  23. Rohatgi VK and Saleh AK (2000). An introduction to probability and statistics (2nd Ed), John Wiley and Sons.
  24. Stark GV and Wolfe DA (2002). Evaluating ranked-set sampling estimators with imperfect rankings. Journal of Statistical Studies, 77-103.
  25. Stokes SL (1995). Parametric ranked set sampling. Annals of the Institute of Statistical Mathematics, 47, 465-482.
  26. Stokes SL and Sager TW (1988). Characterization of a ranked set sample with application to estimating distribution functions. Journal of the American Statistical Association, 83, 374-381.
  27. Takahasi K and Wakimoto K (1968). On unbiased estimates of the population mean based on the sample stratified by means of ordering. Annals of the Institute of Statistical Mathematics, 20, 1-31.
  28. Tiwari N and Chandra G (2011). A systematic approach for unequal allocation for skewed distributions in ranked set sampling. Journal of the Indian Society of Agricultural Statistics, 65, 331-338.
  29. Zhu M and Wang Y (2005). Quantile estimation from ranked set sampling data. Sankhya, 67, 295-304.