TEXT SIZE

search for



CrossRef (0)
On inference of multivariate means under ranked set sampling
Communications for Statistical Applications and Methods 2018;25:1-13
Published online January 31, 2018
© 2018 Korean Statistical Society.

Haresh Rochani1,a, Daniel F. Linderb, Hani Samawia, and Viral Panchalb

aDepartment of Biostatistics, Georgia Southern University, USA, bDepartment of Biostatistics, Augusta University, USA
Correspondence to: 1Corresponding author: Jiann-Ping Hsu College of Public Health, Department of Biostatistics, Georgia Southern University, Statesboro, GA 30460, USA. Email: hrochani@georgiasouthern.edu
Received May 31, 2017; Revised September 27, 2017; Accepted December 8, 2017.
 Abstract

In many studies, a researcher attempts to describe a population where units are measured for multiple outcomes, or responses. In this paper, we present an efficient procedure based on ranked set sampling to estimate and perform hypothesis testing on a multivariate mean. The method is based on ranking on an auxiliary covariate, which is assumed to be correlated with the multivariate response, in order to improve the efficiency of the estimation. We showed that the proposed estimators developed under this sampling scheme are unbiased, have smaller variance in the multivariate sense, and are asymptotically Gaussian. We also demonstrated that the efficiency of multivariate regression estimator can be improved by using Ranked set sampling. A bootstrap routine is developed in the statistical software R to perform inference when the sample size is small. We use a simulation study to investigate the performance of the method under known conditions and apply the method to the biomarker data collected in China Health and Nutrition Survey (CHNS 2009) data.

Keywords : multivariate mean, ranked set sampling, hypothesis testing, regression estimator
1. Introduction

As the complexity and cost of biological experiments has grown considerably in recent years, partly due to technological advances (high throughput technologies and more), there is an increasing need to design experiments that maximize the information content of the collected sample. For most standard statistical analyses, where the aim is to estimate some population parameter, maximizing information translates into minimizing the variance associated with a parameter’s estimate. In many situations, researchers observe multiple outcomes for each unit in the sample and wish to make inferences on a parameter of the underlying population’s joint distribution, routinely this is done via estimating the population mean vector. It is often the case that some or all of the individual components of this response vector are costly, risky (complications due to biopsy), or even destructive (requiring animal sacrifice). In such cases it may be desirable, for monetary or ethical reasons, to extract information from each unit that is sampled, without taking the exact measurement of the response of interest for each unit.

The most common approach for data collection method for making inference about population parameter is simple random sample (SRS) from a population. Even though each subject selected by SRS has an equal chance of being selected from a population to ensure the representativeness of a population, there is no guarantee that the selected sample will truly represent the population. However, the only guarantee one can have is that if the sampling process is being repeated over and over again, then the average of the attribute of interest for multiple SRS would provide the good estimator of the population value of the attribute. Ranked set sampling (RSS) (McIntyre, 1952) is a type of sampling scheme which allows researchers to use information from each unit in the sample, without taking every unit’s exact measurement. The overall goal of the RSS is to obtain the sample from a population that is more likely to span the full range of the values in the population to have a more representative sample than the SRS of similar sample size. Traditionally, RSS can be used provided there is a reliable ranking mechanism available, which should be cheaper or safer than exact measurement, for the response of interest. The ranked but unmeasured units provide increased information over SRS of the same size improving parameter inference. The additional information provided by ranking is due to the fact that aspects of population structure are encoded through the order statistics. Knowledge of observations’ order statistic and exact measurement improve inference since ranked units target different population attributes, unlike the identically distributed unit from a SRS. This has been shown in many works to translate into improvement in parameter inference compared to simple random samples of the same size.

In many situations, the outcome of interest is correlated with some auxiliary variable which may be easier to measure than the outcome of interest. For instance weight may be correlated with fasting blood glucose and may be easily obtained whereas some lab measurement would be necessary for blood glucose measurement. The application of RSS has appeared in series of papers. See for example, Chen (1999), Demir and Çıngı (2000), Huang et al. (2016), Jabrah et al. (2017), Kaur et al. (1996), Samawi and Al-Sagheer (2001).

An outline of the paper is as follows. In Section 2 we introduce the necessary notation and prove that mean estimation is unbiased with a smaller variance for RSS as compared to SRS. In addition, in Section 2, we also derived the limiting distribution of Hotelling’s statistics (Q) as well as the multivariate regression estimator using RSS. In Section 3, we perform a simulation study to compare the performance of RSS to SRS in terms of estimation as well as hypothesis testing. In Section 4, we apply the method on a real data set in the context of public health. We give concluding remarks and future directions for the method in Section 5.

2. Multivariate mean estimation using ranked set sampling

2.1. Ranked set sampling procedure

In this section, we will briefly describe how a ranked set sample may be collected in this section for a univariate random variable. To select the RSS of size n based on the auxiliary variable (X), the following steps should be performed.

  • Select the SRS of size r, from a population based on the auxiliary variable (X). r is referred as the set size which is typically between 2 and 5 although any size is possible. However, sizes larger than 5 may become impractical (Takahasi and Wakimoto, 1968).

  • Order the auxiliary variable and choose the minimum of (X(1)). Measure the multivariate outcome of interest Y(1).

  • Select the SRS of size r and order it based on the auxiliary variable again. Choose the second minimum (X(2)) and measure the multivariate outcome of interest Y[2].

  • Repeat this process until the X(r) and Y(r) of rth independent SRS are obtained.

  • The entire process of obtaining (X(1), X(2), …, X(r)) and (Y(1), Y(2), …, Y(r)) is called a cycle.

  • Repeat m independent cycles to obtain a RSS of size n = rm.

Table 1 represents the structure of RSS. For more details about RSS (Jozani and Johnson, 2011; Kowalczyk, 2004; Patil et al., 1995; Takahasi and Futatsuya, 1998).

2.2. Multivariate naive estimator

Our population of interest is an univariate auxilliary variable X and a d dimensional multivariate outcome Y, with a covariance structure on the joint distribution of (Y, X) given by Σ=[Σ11Σ12Σ21Σ22]. Assuming that we have collected m RSS cycles of set size r, where the ranking has been done on X, we denote the data (X(i)k, Y[i]k), i = 1, 2, …, r, K = 1, 2, …,m. Note that the subscript on X indicates that ranking has been done on X and the subscript on Y indicates that ranking on X may result in imperfect ranking on elements of Y. The naive estimator is defined as μ^yRSS=(1/rm)k=1mi=1rY[i]k. It is straightforward to show this is an unbiased estimator of the mean of Y. Since i=1rfX(i)(x)=rfX(i)(x) (Dell and Clutter, 1972)

Eμ^yRSS=1rmk=1mi=1r--yfYX(i)=x(yx)fX(i)(x)dydx=--yfYX(i)=x(yx)1ri=1rfX(i)(x)dydx=--yfYX(i)=x(yx)fX(x)dydx=μy.

Similarly for the variance (Dell and Clutter, 1972), by defining μ[i] = EY[i] we have

Var(μ^yRSS)=1(rm)2k=1mi=1r--(y-μ[i])(y-μ[i])fYX=x(yx)fX(i)(x)dydx=1rm2k=1m--(y-μ)(y-μ)fYX=x(yx)1ri=1rfX(i)(x)dydx-21(rm)2k=1mi=1r--(y-μ)fYX=x(yx)fX(i)(x)dydx(μ[i]-μ)+1(rm)2k=1mi=1r(μ[i]-μ)--fYX=x(yx)fX(i)(x)dydx(μ[i]-μ)=1(rm)(Σ11)-21r2mi=1r(μ[i]-μ)(μ[i]-μ)+1r2mi=1r(μ[i]-μ)(μ[i]-μ)=1rm(Σ11-1ri=1r(μ[i]-μ)(μ[i]-μ)).

It is clear that i=1r(μ[i]-μ)(μ[i]-μ) is positive semi-definite since ∀u ∈ ℝd we have u(μ[i]μ)(μ[i]μ)u ≥ 0. Then ∀u ∈ ℝd u(Var( μ̂ySRS) – Var( μ̂yRSS))u ≥ 0, or equivalently Var( μ̂ySRS) ≽ Var( μ̂yRSS). Under the additional assumption that rd and X is correlated with each component of Y we have strict inequality.

2.3. Multivariate regression estimator

Regression estimators are used to increase precision in mean estimation by incorporating information in an auxilliary variable. In this case, we assume a linear regression of Y on X

Y=μy+β(X-μx)+ɛ,

where X and ε are independent and ε is a mean zero residual vector with covariance ∑ε. Then the regression equation with corresponding data from RSS is

Y[i]k=μy+β(X(i)k-μx)+ɛ(i)k         i=1,2,,r,k=1,2,,m.

It is worth noting that typically the mean of X, μx, is unknown. However, since the auxilliary variable X may be much cheaper to measure one may use the r2m units collected from the first stage of sampling to estimate this quantity as μ¯x=(1/r2m)kmirjrXijk.

Then the regression estimator for the mean of the response is given by

Y¯reg=μ^yRSS+β^(μ¯x-μ^x),

where

μ^x=1rmk=1mi=1rX(i)k,         β^=k=1mi=1r(X(i)k-μ^x)(Y[i]k-μ^yRSS)k=1mi=1r(X(i)k-μ^x)2.

It is straightforward to show that μ̄x and μ̂x are unbiased estimates of μx using similar arguments as in the previous section. When (2.3) holds conditional expectation implies that Eβ̂ = β and Ereg = μy, so that the regression estimator based on RSS is unbiased. Also

Var (Y¯reg)=EXVarY(Y¯regX)+VarXEY(Y¯regX).

Since EY (reg|X) = μy + β(μ̄xμ̂x) the second term above is (1/r2m)i=1rσX(i)2ββ. For the first term Cov( μ̂yRSS, β(μ̄xμ̂x)|X) = 0 so that

EXVarY(Y¯regX)=EXVarY(μ^yRSSX)+EXVarY(β^(μ¯x-μ^x)X)=1(r2m)2k=1mi=1rΣɛ+EX((μ¯x-μ^x)2VarY(β^X))=1nΣɛ+ΣɛEX(μ¯x-μ^x)2k=1mi=1r(X(i)k-μ^x)2.

2.4. Testing for H0 : μ = μ0

Theorem 1

Let {Y[i]k}, i = 1, 2, … r and k = 1, 2, …m be a RSS sample from normal with mean vectorμ and variance covariance matrix11. Let

Y¯rss=1rmk=1mi=1rY[i]k,Srss=[1rm-1]k=1mi=1r(Y[i]k-Y¯rss)(Y[i]k-Y¯rss)TQ=mr(Y¯rss-μ0)Srss-1(Y¯rss-μ0)

Then for large sample the limiting distribution of Q is the χ2-distribution with d degrees of freedom under the Null Hypothesis ofμ = μ0.

ProofY¯rss=1rmk=1mi=1rY[i]k,Y¯rss=1ri=1rY¯[i].

From Multivariate Central limit theorem m(Y¯[i]-μ[i])dNd(0,Σ11[i]/m) as m → ∞where ∑11[i] is variance covariance matrix of Y[i].

Since [i] are independent

mr(Y¯rss-μ)dNd(0,i=1rΣ11[i]mr).

mr(Y¯rss-μ)dNd(0,Σ11R/mr), where ∑11R is variance covariance matrix of Yrss. Therefore,

mr(Y¯rss-μ)Σ11R-12dNd(0,1)

and hence

mr(Yrss--μ0)Σ11R-1(Yrss--μ0)~χ(d)2.

Since Srss-1/Σ11R-1d1 (See Appendix for more detail)

Q=mr(Y¯rss-μ0)Σ11R-1Srss-1Σ11R-1(Y¯rss-μ0)~χ(d)2.
2.5. Testing for H0 : μ(1) = μ(2)

Theorem 2

Let {Y[i]k(t)}, i = 1, 2, … r, k = 1, 2, …m and t = 1, 2 are two RSS samples fromand. Let

Y¯rss(1)=1r1m1k=1m1i=1r1Y[i]k(1),Y¯rss(2)=1r2m2k=1m2i=1r2Y[i]k(2),Srss=[1r1m1+r2m2-2][k=1m1i=1r1(Y[i]k(1)-Y¯rss(1))(Y[i]k(1)-Y¯rss(1))T+k=1m2i=1r2(Y[i]k(2)-Y¯rss(2))(Y[i]k(2)-Y¯rss(2))T].

Then, Q={(r1m1·r2m2)/(r1m1+r2m2)}(Y¯rss(1)-Y¯rss(2))Srss-1(Y¯rss(1)-Y¯rss(2)), for large samples, has the limiting distribution as χ2with d degrees of freedom under H0 : μ(1) = μ(2).

Proof

The proof is similar to that as in Theorem 1.

2.6. Small samples

For small to moderate samples, for SRS, under H0 the Q statistics is distributed as {(N – 1) d}/(Np) Fd,Nd (Seber, 2009). As explicit distribution of Q statistics is not known, for small or moderate size of RSS samples, we recommend performing hypothesis testing by Bootstrap method. Resampling method for RSS was proposed by (Chen et al., 2004; Modarres et al., 2006). They suggest a natural method to obtain bootstrap samples from each row (within cycle) of a RSS.

3. Simulation

In this section, we conducted the simulation study to estimate the multivariate outcome mean and the performance of the hypothesis testing by RSS scheme. We also studied the performance of testing hypothesis of equality of multivariate outcome means for two groups. For estimation of α of testing Ho : μ = μ0 vs. Ha : μμ0, we considered four multivariate outcomes Yi (i = 1, 2, 3, 4) with μ = [0.3, 0.3, 0.3, 0.2], variances as σ12=σ22=σ32=σ42=4 and covariances as σ12 = 2.39, σ13 = 1.59, σ14 = 2.83, σ23 = 3.19, σ24 = 1.18, and σ34 = 2.24. The auxiliary covariate (X) was simulated with mean 0 and variance σx2=1. For this simulation study, we considered unstructured covariance among multivariate outcome Yi as shown below. Moreover, we used autoregressive covariance structure between auxiliary variable X and Yi with correlation parameter ρ.

Cov(X,Yi)=[12ρ2ρ22ρ32ρ42ρ42.391.592.832ρ22.3943.191.182ρ31.593.1942.242ρ42.831.182.244].

The RSS for of X and Yi were simulated from multivariate normal with mean μ and above variance covariance matrix by following the steps as described in Section 2.1. For comparisons of estimation of α for SRS and RSS, different sample sizes (n = rm) were evaluated by varying the ρ, set size and cycle size. This entire process was repeated 2,000 times. For details of the parameter values, referred to Table 2. Table 2 results demonstrate that we can achieve nominal value for α by using RSS with moderate to large samples, however, for smaller sample bootstrap RSS sampling can achieve nominal value for α.

For estimation of the power of testing Ho : μ = 0 vs. Ha : μ ≠ 0, similar simulation settings were considered as described above except with μ = [0.6, 0.6, 0.6, 0.4]. In addition to that bootstrap power was also calculated by taking 1,000 bootstrap samples for each simulated RSS. Furthermore, MSE of SRS, MSE of RSS and the multivariate naive estimator efficiency were calculated. Table 3 reports the simulation results for estimating the power of testing hypothesis under various simulation settings. We can also report that the power of the test increases as the set size increases with RSS, however, for testing hypothesis RSS gives more power than SRS. As expected, Table 3 also shows that RSS provides more efficient estimates of the multivariate naive estimator in terms of smaller MSEs.

Furthermore, the performance of testing hypothesis of equality of multivariate outcome means for two groups, we simulated two groups with multivariate outcome (Yi ) (i = 1, 2, 3, 4) with means for the first group μ1 = [0.3, 0.3, 0.3, 0.2] and mean for the second group μ2 = [0.6, 0.6, 0.6, 0.4] with similar covariance matrix of Y as described above (Cov(X, Yi)). Table 4 represents the results of estimation of power of the testing hypothesis Ho : μ1 = μ2 vs. Ha : μ1μ2 with various parameter values of ρ, set size and cycle sizes. Overall, from Table 4, we can conclude that RSS is more powerful for testing hypothesis of equality of multivariate outcome means for two groups compared to SRS.

We also conducted a simulation study to show that the multivariate regression estimator for RSS is more efficient than SRS. We considered multivariate outcomes Y with mean μ = (0.3, 0.3, 0.3, 0.2) and the variance-covariance matrix (Cov(X, Yi))) as described above in this section. We also simulated correlated auxiliary covariate (X) with mean 0 and variance 1. Table 5 shows that for various parameter settings, the RSS is more efficient than SRS in estimating multivariate regression estimator.

4. Application to China Health and Nutrition Survey data

In this section, we illustrate the efficient ranked set sampling method via ranking on baseline covariate to estimate the multivariate outcome mean, investigate the performance of the hypothesis testing for two groups and estimation of multivariate regression estimator by using the China Health and Nutrition Survey (CHNS) for year 2009. The CHNS is the only large-scale household based survey in China (Yan et al., 2003). As a part of the survey, anthropometry were collected on 10,242 children and adults aged ≥ 7 in year 2009 along with other demographic information. Only 9,986 individuals agreed to provide the fasting blood samples which were evaluated for many biomarkers of diabetes and cardio-metabolic risk factors. For illustration purposes, we focused on the variables such as age of the individuals as our ranking auxiliary variable, and cardio-metabolic biomarkers, for example, Apolipoprotein A, Total cholesterol and Hemoglobin A1c. We treated the survey data as a population and selected the range of RSS (N = set * cycle) as shown in Table 6 by ranking on the baseline covariate age. SRS of similar size N was also selected from CHNS data to evaluate the performance of the hypothesis testing and the efficiency of the sampling procedure compared to RSS in estimating the multivariate outcome mean. The correlations (ρ) between age and biomarkers Apolipoprotein A, Total cholesterol and Hemoglobin A1c are 0.12, 0.32, and 0.22 respectively. The mean for Apolipoprotein A, Total cholesterol and Hemoglobin A1c are 1.14 (g/L), 4.78 mmol/L and 5.67 mmol/L respectively, and for comparison purposes, they can be treated as the true parameters. Table 7 represents the power comparison of RSS with SRS for multivariate means of males and females. Table 7 represents that we can achieve more power with RSS compared to SRS with similar sample sizes. Table 8 shows the results for multivariate regression estimation for biomarker data. We also took 1,000 samples of SRS and RSS of sample size 80 (set = 4 and cycle = 20) and plotted the confidence regions as shown in Figure 1. From Figure 1, we can see that the confidence region for SRS (blue nets) lies completely outside of the confidence region of RSS (red).

5. Conclusion

In statistics, it is important to have a sampling method which is cost effective. RSS is one the important method which can be used to have a more efficient multivariate mean estimator compared to most commonly used method of SRS. The samples taken by using RSS method are more representative samples due to its inherent structure imposed by ranking based on easy-to-available covariates. In this paper, we demonstrated that the RSS is more efficient in estimating the multivariate mean as well as in hypothesis testing for one and two independent samples. Simulation studies for the performance of hypothesis testing showed that the RSS is more powerful compared to SRS. In general, in estimation of the population mean, RSS improves the precision relative to SRS with the same sample size, n. This is true even if the correlation between the auxiliary variable X and multivariate outcome Y is moderate to high (±0.4 to ±0.8). However, when the correlation between X and Y is very low (such as ± 0.001), RSS is equivalent to SRS and the ranking is not better than random. In practice, the key issue is whether the increase in precision is sufficient to justify the increased costs associated with the ranking process. In contrast, when the correlation between X and Y is very high (±0.9 or higher), the precision in estimating the population mean will be very high as this will improve the ranking of X on Y (Ridout, 2003).

Missing data is a very common problem in all most every research and can have a very significant impact on the inferences drawn from the collected data such as biased estimation of population parameters and loss of statistical power (Little and Rubin, 2014). The valid statistical analysis which has appropriate missing data mechanisms assumptions (missing completely at random, missing at random, or missing not at random) should be performed in SRS and in RSS. There is an extensive literature available on how to deal with missing data for RSS in auxiliary variable X and univariate response Y (Bouza-Herrera, 2013). However, handling the missing data in multivariate Y with monotone or arbitrary missing pattern is still the active area of research.

Figures
Fig. 1. Confidence region for SRS (blue nets) and RSS (dolid ted) gor China Health and Nutrition Survey data.
TABLES

Table 1

Structure of ranked set sampling

Cycle 1(X(1)1, Y(1)1)(X(2)1, Y(2)1))· · ·(X(r)1Y(r)1)
Cycle 2(X(1)2, Y(1)2)(X(2)2, Y(2)2)· · ·(X(r)2, Y(r)2)
Cycle m(X(1)m, Y(1)m)(X(2)m, Y(2)m)· · ·(X(r)m, Y(r)m)

Table 2

Estimation of the α of testing Ho : μ = 0 vs. Ha : μ ≠ 0

ρCycleSet = 3Set = 4Set = 5



SRSRSSBSaSRSRSSBSaSRSRSSBSa
−0.850.04600.13750.02150.03750.10650.04750.04550.07950.0605
100.04550.07550.04500.04600.05300.04950.04100.05350.0620
200.05200.05450.05350.04850.04450.05450.05600.04100.0605
300.05550.04550.05600.04950.04300.06000.04800.03600.0530

−0.650.04750.15800.02100.04800.10700.04700.05500.07100.0600
100.04600.07200.04550.04650.05900.05550.04900.04050.0500
200.05200.04750.04600.04500.04600.05200.06050.03950.0550
300.04500.04150.05000.04700.03850.05550.05050.04150.0595

−0.450.04600.14000.02250.05400.10550.04400.05150.08400.0585
100.05800.06900.03950.05150.05550.05150.04500.05300.0655
200.04950.05000.05250.04950.04150.05100.05200.03600.0560
300.05950.04600.05300.05200.03600.05300.05600.02900.0515

0.450.05150.15300.02900.06150.09750.04250.05700.08000.0640
100.04450.07300.04050.05600.04950.04850.04950.04450.0540
200.05200.04200.04300.05050.04300.05750.04950.03100.0505
300.05250.04950.05700.04050.03850.05800.04950.03100.0505

0.650.05200.15450.02550.05450.09000.03800.04650.07850.0610
100.05400.08400.05250.05950.06600.06200.04750.05350.0595
200.04400.04950.04900.04700.04300.05550.05250.04000.0555
300.05550.04550.05350.04750.03400.05100.05550.02950.0465

0.850.05750.13650.01950.04650.09700.04500.04700.08400.0580
100.05200.07500.04550.05200.07300.06800.04950.04700.0580
200.04950.05900.06200.05350.03600.04700.05800.03500.0505
300.05600.04500.05200.04950.04050.05800.05000.04000.0570

SRS = simple random sample; RSS = ranked set sampling; BSa = Bootstrap α.


Table 3

Estimation of power of testing Ho : μ = 0 vs. Ha : μ ≠ 0

SetρCycleSRSRSSBootstrapSRSRSS
PowerPowerPowerMSEMSE
30.450.09000.22550.04453.88E–052.07E–05
100.16450.21200.14252.53E–061.37E–06
200.34450.34900.35101.79E–078.11E–08
300.48800.52800.56303.07E–081.69E–08

0.650.08500.22600.04404.93E–052.58E–05
100.14000.19900.13452.97E–061.61E–06
200.29450.29800.30301.94E–079.52E–08
300.43000.45400.48604.31E–082.30E–08

0.850.09950.25600.05052.28E–051.18E–05
100.20550.26400.17351.48E–068.17E–07
200.41400.44150.44651.17E–074.64E–08
300.59500.68050.70801.96E–089.07E–09

40.450.11000.18950.09501.16E–055.45E–06
100.20000.23650.22557.51E–073.29E–07
200.44900.45800.51255.32E–082.22E–08
300.61650.66000.72651.00E–084.16E–09

0.650.10150.17700.08651.53E–056.32E–06
100.18750.20200.18501.08E–064.01E–07
200.35650.35600.40006.00E–082.54E–08
300.55150.62850.69601.10E–084.65E–09

0.850.11250.22550.10608.08E–063.59E–06
100.25000.29400.28205.32E–072.15E–07
200.49500.56200.61502.61E–081.27E–08
300.74350.82900.86306.97E–092.63E–09

50.450.14400.19200.14755.22E–061.86E–06
100.25750.28300.30803.01E–071.13E–07
200.53250.58100.65652.10E–087.11E–09
300.73550.79100.84554.41E–091.54E–09

0.650.12000.16900.12856.41E–062.36E–06
100.21100.21400.24303.87E–071.45E–07
200.48000.49300.58202.70E–089.50E–09
300.64450.69600.77955.27E–091.63E–09

0.850.15050.21600.16252.96E–061.20E–06
100.30800.33100.36252.05E–076.88E–08
200.64250.72300.79351.28E–084.78E–09
300.83600.90600.94902.44E–098.07E–10

3−0.450.13600.31850.07902.34E–041.35E–04
100.26350.38650.29701.35E–058.05E–06
200.61550.67400.67609.20E–075.14E–07
300.81250.84500.86251.94E–071.01E–07

−0.650.12100.32600.08156.66E–043.72E–04
100.26700.36850.28454.59E–052.30E–05
200.54600.59450.59502.55E–061.39E–06
300.76900.78400.81005.50E–072.61E–07

−0.850.12350.30750.08151.20E–036.60E–04
100.25500.37350.29157.74E–054.49E–05
200.55050.58400.58354.62E–062.18E–06
300.78400.81600.83759.25E–074.42E–07

4−0.450.18300.30200.17758.33E–053.32E–05
100.40500.47400.46005.39E–062.19E–06
200.78700.81300.83653.10E–071.33E–07
300.91500.95050.96105.80E–082.81E–08

−0.650.16750.30750.17352.34E–048.93E–05
100.34100.44350.42551.41E–056.17E–06
200.70950.74200.77258.19E–073.46E–07
300.90150.91050.93351.66E–077.02E–05

−0.850.15650.30650.18603.50E–041.65E–04
100.36350.42400.40951.95E–051.05E–05
200.76050.76050.79351.41E–066.53E–07
300.89850.89700.92102.95E–071.25E–07

5−0.450.25550.36400.31003.33E–051.20E–05
100.51350.56450.59202.18E–068.53E–07
200.85900.88650.91851.08E–074.18E–08
300.97750.97850.98652.34E–088.02E–09

−0.650.21200.32100.26059.69E–053.36E–05
100.46300.51050.54005.91E–062.21E–06
200.81550.82350.85653.88E–071.31E–07
300.95300.95000.96507.24E–082.69E–08

−0.850.21150.34350.29201.49E–045.84E–05
100.45950.52700.55959.85E–063.53E–06
200.80800.81350.85756.00E–072.14E–07
300.94850.95250.96801.23E–074.38E–08

SRS = simple random sample; RSS = ranked set sampling; MSE = mean square error.


Table 4

Estimation of power of testing Ho : μ1 = μ2 vs. Ha : μ1μ2

ρCycleSet = 3Set = 4Set = 5



SRSRSSBSaSRSRSSBSaSRSRSSBSa
−0.4100.30550.37400.40580.31900.35450.42270.39050.40150.4354
200.39750.39900.40210.46650.48520.49730.46750.48400.5175
300.47850.48850.48000.50000.52450.51270.58650.60350.6131
400.57100.56050.58240.61500.65050.64210.64800.65700.6491

−0.6100.27100.36100.38120.34700.36300.4080.36390.39000.4128
200.39500.41600.42100.42400.41250.43570.45150.49700.5087
300.45700.44700.44240.48250.49150.48790.52850.56750.5564
400.55550.55350.55420.57450.62100.63210.61800.64750.6427

−0.8100.28200.35500.38290.31600.35650.41860.36700.38550.4210
200.37850.39900.40210.41350.45000.46100.44750.50350.5142
300.46750.46250.46100.48100.52700.52870.51650.55250.5641
400.52750.52800.51950.56350.60350.59870.60400.63350.6289

0.4100.34850.44800.48450.40250.44450.49750.47150.47750.5012
200.50750.55800.56410.57000.63050.64410.64750.66250.6951
300.65200.65200.66410.73250.78250.78880.74300.80650.8125
400.68950.72650.72480.81050.86050.85890.86000.92100.9287

0.6100.33050.41300.49650.40550.42700.51020.43800.46800.5354
200.47450.50200.52140.53600.60400.62140.59850.66100.6698
300.59700.62650.63690.66200.71350.71250.74200.81050.8214
400.65850.71450.72350.74950.80200.79850.83000.89200.8879

0.8100.37000.44900.50350.46650.51550.52100.51400.54500.5621
200.54950.58650.60890.64900.70200.71240.72750.79750.8213
300.70400.74150.73580.77450.85550.86140.83950.92550.9159
400.80550.85300.85210.87550.92950.91240.93200.97250.9800

SRS = simple random sample; RSS = ranked set sampling; BSa = Bootstrap α.


Table 5

Estimation of multivariate regression estimator

ρCycleSet = 3Set = 4Set = 5



MSE SRSMSE RSSMSE SRSMSE RSSMSE SRSMSE RSS
0.450.00780.00100.00240.00020.00106.49E–05
100.00044.40E–050.00011.22E–054.51E–053.29E–06
202.35E–052.73E–065.23E–066.57E–072.46E–062.30E–07
304.02E–065.61E–071.23E–061.26E–075.43E–074.17E–08

0.650.00980.00110.00320.00030.00117.69E–05
100.00045.77E–050.00011.16E–056.31E–054.72E–06
202.61E–053.60E–067.21E–067.73E–073.39E–062.90E–07
304.92E–066.95E–071.52E–061.72E–075.99E–077.68E–08

0.850.01090.00060.00360.00010.00125.70E–05
100.00052.96E–050.00027.87E–067.10E–052.62E–06
202.98E–051.74E–061.08E–054.59E–073.68E–061.57E–07
305.27E–063.37E–071.64E–068.36E–086.44E–072.75E–08

−0.450.01330.00570.00390.00130.00140.0005
100.00060.00030.00025.74E–058.44E–052.53E–05
203.64E–051.52E–051.25E–054.58E–064.25E–061.58E–06
307.16E–063.44E–062.69E–069.54E–077.72E–073.07E–07

−0.650.02700.01450.00700.00350.00280.001197
100.00130.00070.00030.00020.00017.06E–05
206.85E–054.62E–051.82E–051.51E–058.30E–064.00E–06
301.15E–059.00E–064.13E–062.45E–061.63E–068.03E–07

−0.850.04850.02730.01140.00740.00380.0024
100.00170.00150.00050.00040.00020.0001
207.85E–057.46E–052.69E–052.11E–051.17E–058.28E–06
301.85E–051.73E–055.90E–064.77E–062.42E–061.32E–06

MSE = mean square error; SRS = simple random sample; RSS = ranked set sampling.


Table 6

Multivariate mean estimation and MSEs for China Health and Nutrition Survey data

SetCycleSRS MSERSS MSEEfficiency
353.07E–053.04E–051.01
103.88E–063.41E–061.14
204.66E–074.43E–071.05
301.38E–071.26E–071.09

451.31E–051.17E–051.12
101.65E–061.47E–061.12
202.01E–071.81E–071.11
305.80E–085.43E–081.07

556.67E–065.85E–061.14
108.00E–077.08E–071.13
201.01E–079.06E–081.11
302.94E–082.57E–081.14

SRS = simple random sample; RSS = ranked set sampling; MSE = mean square error.


Table 7

Estimation of power of testing for Biomarker data for gender

CycleSet = 3Set = 4Set = 5



SRSRSSSRSRSSSRSRSS
100.25310.34140.28750.33970.31550.3625
200.33530.36630.37220.39680.40170.4265
300.38930.40660.42430.45260.46910.4890
400.43240.44250.48230.50170.53740.5415

SRS = simple random sample; RSS = ranked set sampling.


Table 8

Multivariate regression estimation for China Health and Nutrition Survey data

CycleSet = 3Set = 4Set = 5



MSE SRSMSE RSSMSE SRSMSE RSSMSE SRSMSE RSS
54.87E–053.03E–052.08E–051.89E–051.00E–055.82E–06
105.40E–064.84E–062.26E–061.49E–061.12E–069.91E–07
207.11E–075.13E–072.56E–072.25E–071.64E–071.09E–07
302.07E–071.59E–077.41E–085.58E–083.92E–083.31E–08

MSE = mean square error; SRS = simple random sample; RSS = ranked set sampling.


References
  1. Bouza-Herrera, CN (2013). Handling Missing Data in Ranked Set Sampling. Heidelberg: Springer
    CrossRef
  2. Chen, Z (1999). Density estimation using ranked-set sampling data. Environmental and Ecological Statistics. 6, 135-146.
    CrossRef
  3. Chen, Z, Bai, Z, and Sinha, B (2004). Ranked Set Sampling: Theory and Applications. New York: Springer Science & Business Media
    CrossRef
  4. Dell, TR, and Clutter, JL (1972). Ranked set sampling theory with order statistics background. Biometrics. 28, 545-555.
    CrossRef
  5. Demir, S, and 횉캇ng캇, H (2000). An application of the regression estimator in ranked set sampling. Hacettepe Bulletin of Natural Sciences and Engineering, Series B. 29, 93-101.
  6. Huang, Y, Samawi, HM, Vogel, R, Yin, J, Gato, WE, and Linder, DF (2016). Evaluating the efficiency of treatment comparison in crossover design by allocating subjects based on ranked auxiliary variable. Communications for Statistical Applications and Methods. 23, 543-553.
    CrossRef
  7. Jabrah, R, Samawi, HM, Vogel, R, Rochani, HD, Linder, DF, and Klibert, J (2017). Using ranked auxiliary covariate as a more efficient sampling design for ANCOVA model: analysis of a psychological intervention to buttress resilience. Communications for Statistical Applications and Methods. 24, 241-254.
    CrossRef
  8. Jozani, MJ, and Johnson, BC (2011). Design based estimation for ranked set sampling in finite populations. Environmental and Ecological Statistics. 18, 663-685.
    CrossRef
  9. Kaur, A, Patil, GP, Shirk, SJ, and Taillie, C (1996). Environmental sampling with a concomitant variable: a comparison between ranked set sampling and stratified simple random sampling. Journal of Applied Statistics. 23, 231-256.
    CrossRef
  10. Kowalczyk, B (2004). Ranked set sampling and its applications in finite population studies. Statistics in Transition. 6, 1031-1046.
  11. Little, RJA, and Rubin, DB (2014). Statistical Analysis with Missing Data. New York: John Wiley & Sons
  12. McIntyre, GA (1952). A method for unbiased selective sampling, using ranked sets. Australian Agricultural Research. 3, 385-390.
    CrossRef
  13. Modarres, R, Hui, TP, and Zheng, G (2006). Resampling methods for ranked set samples. Computational Statistics & Data Analysis. 51, 1039-1050.
    CrossRef
  14. Patil, GP, Sinha, AK, and Taillie, C (1995). Finite population corrections for ranked set sampling. Annals of the Institute of Statistical Mathematics. 47, 621-636.
    CrossRef
  15. Ridout, MS (2003). On ranked set sampling for multiple characteristics. Environmental and Ecological Statistics. 10, 255-262.
    CrossRef
  16. Samawi, HM, and Al-Sagheer, OAM (2001). On the estimation of the distribution function using extreme and median ranked set sampling. Biometrical Journal. 43, 357-373.
    CrossRef
  17. Seber, GAF (2009). Multivariate Observations. New York: John Wiley & Sons
  18. Takahasi, K, and Futatsuya, M (1998). Dependence between order statistics in samples from finite population and its application to ranked set sampling. Annals of the Institute of Statistical Mathematics. 50, 49-70.
    CrossRef
  19. Takahasi, K, and Wakimoto, K (1968). On unbiased estimates of the population mean based on the sample stratified by means of ordering. Annals of the Institute of Statistical Mathematics. 20, 1-31.
    CrossRef
  20. Yan, S, Li, J, Li, S, Zhang, B, Du, S, Gordon-Larsen, P, Adair, L, and Popkin, B (2012). The expanding burden of cardiometabolic risk in China: the China Health and Nutrition Survey. Obesity Reviews: An Official Journal Of The International Association For The Study Of Obesity. 13, 810-821.
    CrossRef