TEXT SIZE

search for



CrossRef (0)
Item sum techniques for quantitative sensitive estimation on successive occasions
Communications for Statistical Applications and Methods 2019;26:175-189
Published online March 31, 2019
© 2019 Korean Statistical Society.

Kumari Priyanka1,a, Pidugu Trisandhyaa

aDepartment of Mathematics, Shivaji College, University of Delhi, India
Correspondence to: 1Department of Mathematics, Shivaji College, University of Delhi, New Delhi 110 027, India. E-mail: priyanka.ism@gmail.com
Received October 4, 2018; Revised January 24, 2019; Accepted January 31, 2019.
 Abstract

The problem of the estimation of quantitative sensitive variable using the item sum technique (IST) on successive occasions has been discussed. IST difference, IST regression, and IST general class of estimators have been proposed to estimate quantitative sensitive variable at the current occasion in two occasion successive sampling. The proposed new estimators have been elaborated under Trappmann et al. (Journal of Survey Statistics and Methodology, 2, 58–77, 2014) as well as Perri et al. (Biometrical Journal, 60, 155–173, 2018) allocation designs to allocate long list and short list samples of IST. The properties of all proposed estimators have been derived including optimum replacement policy. The proposed estimators have been mutually compared under the above mentioned allocation designs. The comparison has also been conducted with a direct method. Numerical applications through empirical as well as simplistic simulation has been used to show how the illustrated IST on successive occasions may venture in practical situations.

Keywords : sensitive variable, successive occasions, class of estimators, population mean, variance, bias, mean squared error, optimum matching fraction
1. Introduction

Data gathering on sensitive, incriminating, stigmatizing or too personal issues are an avowedly baffling task. The three possibilities: under reporting, over reporting and refusal to answer the sensitive questions are prevalent. Generally, socially undesirable characteristics such as drug addiction, tax evasion, plagiarism, criminal conviction, and unauthorized natural resource use are likely to be under reported whereas socially desirable characteristics such as energy conservation, reducing pollution, ecological and biological conservation, and participation in elections are likely to be over reported. However some may refuse to report because of social stigma or fear of privacy violations. All these phenomenon provoke non sampling errors that are difficult to deal with and can seriously damage the validity of the analysis. In order to overcome mis-reporting on sensitive issues, many data collection strategies have been developed to elicit a more honest response from respondents by increasing the obscurity of the survey process and ensuring privacy protection.

One important method of survey addressing sensitive issue is the indirect questioning method. The indirect questioning technique may be classified in three different categories: the randomized response technique (RRT), the item count technique (ICT), and the non RRT. The RRT was initiated by Warner (1965), the ICT was originally proposed by Miller (1984) for binary variables to estimate the prevalence of a stigmatizing behavior within the population. However the non RRT was initiated recently by Tian and Tang (2014).

In this paper we focus on the generalization of second technique the ICT. The ICT is used in surveys that require the study of qualitative variable. However, Chaudhuri and Christofides (2013) proposed a generalization to ICT for estimating a quantitative sensitive variable. Trappmann et al. (2014) named this technique as item sum technique (IST). Perri et al. (2018) discussed the optimal sample size allocation in the IST. Further enhancement in IST literature can be seen in Hussian et al. (2015), Rueda et al. (2017).

In many fields of applied research it has been observed that data needs to be gathered for a sensitive variables which can change over time. The statistical tool recommended for such a situation is successive sampling. Jessen (1942) initiated the idea of sampling same population over time with partial replacement of units called successive sampling. However, an analysis of a sensitive variable in successive sampling has been initiated by Arnab and Singh (2013). They applied RRT on successive occasions. Additional literature addressing sensitive issue over successive occasions can be seen in Yu et al. (2015), Naeem and Shabbir (2016), Singh et al. (2017), Priyanka et al. (2018), and Priyanka and Trisandhya (2018). These researchers focused on a scrambled response technique or RRT to handle sensitive issues on successive occasions. However, the IST is now emerging as an alternative technique to deal with sensitive issues. Hence, an attempt has been made in the present work to use IST in successive sampling to estimate a sensitive population mean. To the best of our knowledge this is an initial attempt and will contribute another useful method in successive sampling literature.

Therefore, the IST difference, the IST regression and also the IST general class of estimators have been proposed in the present work to estimate a sensitive population mean in two occasion successive sampling. All these proposed estimators have been discussed under Trappmann et al. (2014) allocation and Perri et al. (2018) optimal allocation designs. Detail properties including optimum replacement strategies are elaborated. The proposed methods have been compared mutually as well as with the corresponding direct method. Numerical applications through empirical simulation and simplistic simulation show how the illustrated IST on successive occasions may develop in practical situations. Finally some concluding remarks have been forwarded.

2. The item sum technique

The well-known technique in sensitive characteristics estimation is the ICT; however, the ICT is generally applicable for dichotomous (qualitative) variables only. Hence, the ICT was generalized by Chaudhuri and Christofides (2013) that can be used to estimate the quantitative sensitive variable. Later Trappmann et al. (2014) named this generalized version of ICT as IST and used it to estimate some quantitative sensitive variable. The algorithm for the IST is as follows.

From a random sample (s), two random sub-samples (sll and ssl) are generated. The sub-sample sll, is confronted with a long list (LL) of items containing the sensitive question and a number of innocuous/non-sensitive questions. However the respondents in sub-sample ssl have been given a short list (SL) of items containing only the innocuous questions present in the LL sample. Respondents in each sample are asked to report the total score of all items given to them, without disclosing the individual scores for the items. The mean difference of the answers between the sll and ssl is used as an unbiased estimator of the population mean of the sensitive variable. All sensitive and innocuous variables should be quantitative in nature and measured on the same scale as that of the sensitive variable in the IST.

The decisive point in the IST is how to split the total sample in to the LL sample and SL sample. Trappmann et al. (2014) allocated the same number of units to each sample irrespective of the variation of items in the two lists. However, Perri et al. (2018) advocated the requirement of optimum allocation of LL and SL samples. The IST may be modified to deal with sensitive issues on successive occasions if the sensitive variable is also changing by time, which is often the scenario. For example if the sensitive variable is the amount spent per month on drugs such as cigarettes or pan masala by college students, then the non-sensitive variable may be taken as the total monthly pocket money received by them or the amount spent on purchasing books. Similarly, if the sensitive variable is the number of abortions, then the non-sensitive variable may be the number of children or total number of members in that family. The sensitive question together with non-sensitive questions will comprise of LL sample; however, only non-sensitive questions will comprise the SL sample. There may be any number of non-sensitive question with a sensitive question to be used for LL sample and the same non-sensitive questions to be used for the SL sample. But in this paper we considered one sensitive and one non-sensitive question case on successive occasions.

3. Proposed IST frame work in successive sampling design

Consider a finite population P consisting of N identifiable units for sampling over two successive occasions. Let x denote the quantitative sensitive variable at the first occasion which changes to y at second occasion. Similarly let t1 be the non sensitive/innocuous variable at the first occasion which changes to t2 at the second occasion. Assume that xi, yi, t1i, and t2i denotes the value of x, y, t1, and t2 respectively on the unit iP. To estimate the population mean of quantitative sensitive variable Y at current occasion using the IST, the sampling design is proposed as:

At first occasion a sample of size n is drawn using simple random sample without replacement (SR-SWOR) which has been split to snll and snsl samples called the LL-sample and SL-sample respectively. Now, at the second occasion considering the partial overlap case, two independent samples are considered, one is a matched sample of size m = drawn as SRSWOR sub-sample from sample size n at first occasion and second is a fresh sample of size u = (nm) = , which is drawn afresh at current occasion. Further, the samples of sizes m and u are split into corresponding LL-sample and SL-samples as smll, smsl, sull, and susl respectively. The response obtained from the respondents on two occasions and the corresponding IST estimate based on different samples are presented in Table 1.

4. IST successive difference estimator

In order to utilize information available from previous occasion an IST difference type estimator is considered based on sample of size m retained from previous occasion and the estimator based on sample of size u is the IST estimator T1u=y¯^u. Combining the two estimators as the convex linear combinations, the final estimator for sensitive population mean at current occasion is given by


T1=1Tu+(1-1)T1m,

where Tu=y¯^u and T1m=y¯^m+k(x¯^n-x¯^m); 1 [0, 1] and k is a scalar quantity to be chosen suitably.

5. IST successive regression estimator

The regression estimator is another estimator in survey sampling theory. Hence, the estimator for the matched portion of the sample have been chosen as a regression type estimator given by . The final IST successive regression estimator to estimate the sensitive population mean at current occasion is given as


T2=2Tu+(1-2)T2m,

where T2m=y¯^m+b^(mll)(x¯^n-x¯^m) with b^(mll)={sz1z2(mll)}/{sz22(mll)} and 2 ∈ [0, 1] is a scalar quantity to be chosen suitably.

6. IST successive general class of estimator

Many estimators such as ratio, product, exponential ratio, may be thought on similar lines for proposing an estimator based on matched sample of size m. Therefore, in order to generalize the frame work, an IST general class of estimator has been proposed, so that the IST difference, IST regression and others may be viewed as members of the proposed class of estimator. Hence, the final estimator in this case is given as


T3=3Tu+(1-3)T3m,

where T3m=g(y¯^m,x¯^m,x¯^n) is a function of y¯^m,x¯^m, and x¯^n. Following Srivastava and Jhajj (1980), Tracy et al. (1996), and Priyanka and Trisandhya (2018), the function g is assumed such that it satisfies following conditions:

  • The point (y¯^m,x¯^m,x¯^n) assumes the value in a closed convex subset 꽍3 of three dimensional real space containing the point (Y, X, X).

  • The function g(y¯^m,x¯^m,x¯^n) is continuous and bounded in 꽍3.

  • g(Y, X, X) = Y and g1(Y¯,X¯,X¯)=g(y¯^m,x¯^m,x¯^n)/y¯^m=1, i.e., first order partial derivative of g with respect to y¯^m at g(Y¯,X¯,X¯)=Y¯g1(K)=g(·)/y¯^mK=1, where K = (Y, X, X).

  • The first and second order partial derivatives of g(y¯^m,x¯^m,x¯^n) exist and are continuous and bounded in 꽍3.

7. Analysis of IST estimators on successive occasions

To elucidate the performances of proposed IST estimators, the bias, variance/mean squared error of the proposed estimators has been calculated as

B(Ti)=E(Ti-Y¯)=E[i(Tu-Y¯)+(1-i)(Tim-Y¯)]=iB[Tu]+(1-i)B[Tim], 듼 듼 듼i=1,2,3.

Since, is unbiased for Y, so . Therefore, the bias of estimator is given as


B(Ti)=(1-i)B(Tim), 듼 듼 듼i=1,2,3.

The variance of the estimator is computed as


V(Ti)=E(Ti-Y¯)2=E[i(Tu-Y¯)+(1-i)(Tim-Y¯)]2=i2V(Tu)+(1-i)2V(Tim)+2i(1-i)cov(Tu,Tim), 듼 듼 듼i=1,2,3.

As and are based on two independent samples of sizes u and m respectively. So, . Therefore, the variance of estimator becomes


V(Ti)=i2V(Tu)+(1-i)2V(Tim).

It can be seen that, in equation (7.3) is a function of i. So, it has been optimized with respect to i and optimum value of i is obtained as:


iopt.=V(Tim)V(Tu)+V(Tim), 듼 듼 듼i=1,2,3.

Substituting the optimum value of i from equation (7.4) in equation (7.3) the optimum variance of the proposed estimator is computed as


V(Ti)opt.=V(Tu)×V(Tim)V(Tu)+V(Tim), 듼 듼 듼i=1,2,3.

7.1. Bias and variance of and ; i = 1, 2, 3

The estimator is unbiased for Y, hence, its variance is computed as


V(Tu)=(Sz22ull)+(St22usl)-(Sz22+St22N).

The estimator is also unbiased for Y, so its variance is obtained as


V(T1m)=(1mll)[k2Sz12-2kρz1z2Sz1Sz2+Sz22]+(1msl)[k2St12-2kSt1St2ρt1t2+St22]+(1nll)[2kρz1z2Sz1Sz2-k2Sz12]+(1nsl)[2kSt1St2ρt1t2-k2St12]-(1N)[Sz22+St22].

The above expression for variance of contain an unknown constant k, hence, it is optimized with respect to k and optimum value of k is obtained as

kopt.=(1mll-1nll)Sz1Sz2ρz1z2+(1msl-1nsl)St1St2ρt1t2(1mll-1nll)Sz12+(1msl-1nsl)St12.

Now, as the estimator and are biased for Y, hence the expression for their bias and mean squared error have been computed under the following transformations:

z¯2ull=Z¯2(1+e0), 듼 듼 듼z¯2mll=Z¯2(1+e1), 듼 듼 듼z¯1mll=Z¯1(1+e2), 듼 듼 듼z¯1nll=Z¯1(1+e3),t¯2usl=T¯2(1+e4), 듼 듼 듼t¯2msl=T¯2(1+e5), 듼 듼 듼t¯1msl=T¯1(1+e6), 듼 듼 듼t¯1nsl=T¯1(1+e7),sz22(mll)=Sz22(1+e8), 듼 듼 듼sz1z2(mll)=Sz1z2(1+e9),nrs=νrs(ν20)r2(ν02)s2, 듼 듼 듼νrs=1n-1Σ(z1i-z1¯)r(z2i-z2¯)s

such that, E(ei) = 0; |ei| < 1, where i = 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9.

Under the above transformations, retaining the terms up to first order of approximation, we have for bias and mean squared error of as


B(T2m)=(1mll-1nll)βz1z2[Sz1n03-Sz1n12ρz1z2]

and


V(T2m)=(1mll)[Sz12βz1z22-2Sz1Sz2βz1z2ρz1z2+Sz22]+(1msl)[St22+βz1z22St12-2βz1z2St1St2ρt1t2]+(1nll)[2Sz1Sz2βz1z2ρz1z2-Sz12βz1z22]+(1nsl)[2St1St2ρt1t2βz1z2-St12βz1z22]-(1N)[Sz22+St22].

Also, the bias and mean squared error of the estimator has been is derived under above considered transformations as:

T3m=g(y¯^m,x¯^m,x¯^n).

Expanding g(y¯^m,x¯^m,x¯^n) about the point K = (Y, X, X) using Taylor series expansion, retaining terms up to first order of approximations, we have


T3m=g[Y¯+(y¯^m-Y¯),X¯+(x¯^m-X¯),X¯+(x¯^n-X¯)]=(y¯^m+(x¯^m-X¯)G2+(x¯^n-X¯)G3+[(y¯^m-Y¯)2G11+(x¯^m-X¯)2G22+(x¯^n-X¯)2G33+(y¯^m-Y¯)(x¯^m-X¯)G12+(y¯^m-Y¯)(x¯^n-X¯)G13+(x¯^m-X¯)(x¯^n-X¯)G23+],

where,

G1=gy¯^m|K=1, 듼 듼 듼G2=gx¯^m|K, 듼 듼 듼G3=gx¯^n|K, 듼 듼 듼G11=122gy¯^m2|K=0, 듼 듼 듼G22=122gx¯^m2|K,G33=122gx¯^n2|K, 듼 듼 듼G12=122gy¯^mx¯^m|K, 듼 듼 듼G13=122gy¯^mx¯^n|K, 듼 듼 듼G23=122gx¯^mx¯^n|K.

Bias and Mean squared error of the class of estimator to the first order approximations are obtained as


B(T3m)=1mll[Sz12G22+ρz1z2Sz1Sz2G12]+1msl[St12G22+ρt1t2St1St2G12]+1nll[Sz12G33+ρz1z2Sz1Sz2G13+Sz12G23]+1nsl[St12G33+ρt1t2St1St2G13+St12G23]-1N[St12(G22+G33+G23)+St1St2ρt1t2(G12+G13)+Sz12(G22+G33+G23)+Sz1Sz2ρz1z2(G12+G13)]

and


V(T3m)=(1mll)[Sz12G22+2ρz1z2Sz1Sz2G2+Sz22]+(1msl)[St12G22+2St1St2ρt1t2G2+St22]+(1nll)[Sz12G32+2ρz1z2Sz1Sz2G3+2G2G3Sz12]+(1nsl)[St12G32+2G3St1St2ρt1t2+2G2G3St12]-(1N)[Sz12G22+2ρz1z2Sz1Sz2G2+St12G22+2St1St2ρt1t2G2+Sz22+St22+Sz12G32+2ρz1z2Sz1Sz2G3+2G2G3Sz12+St12G32+2St1St2ρt1t2G3+2G2G3St12].

Clearly, we can see that equation (7.12) is a function of G2 and G3. So, after minimizing equation 15 by partially differentiating with respect to G2 and G3 respectively and equating to zero we get the optimized value of G2 and G3 as

G2opt.=-[(1mll-1nll)Sz1Sz2ρz1z2+(1msl-1nsl)St1St2ρt1t2](1mll-1nll)Sz12+(1msl-1nsl)St12,G3opt.=-[(1nll-1N)(Sz1Sz2ρz1z2+Sz12G2)+(1nsl-1N)(St1St2ρt1t2+St12G2)](1nll-1N)Sz12+(1nsl-1N)St12.

7.2. Allocating LL & SL Sample using Trappmann et al. (2014) approach and Perri et al. (2018) approach

Since, in IST a sample is split in to LL sample and SL sample. Trappmann et al. (2014) considered equal number of units in both the samples irrespective of variability of the items in the two lists. Applying his approach on successive occasions we have the following allocations:

nll=nsl=n2, 듼 듼 듼mll=msl=m2, 듼 듼 듼and 듼 듼 듼ull=usl=u2.

However, Perri et al. (2018) concluded that the estimates may be affected due to high variability of items in LL sample and SL sample. Hence, they proposed optimal sample size allocation to LL and SL samples by minimizing the variance of IST estimates under a budget constraints. Hence, modifying these ideas to work for allocating LL sample and SL sample on various samples at first and second occasion assuming same budget allocation for each LL and SL samples we have:

nll=nSz1Sz1+St1=nβ1(say), 듼 듼 듼nsl=nSt1Sz1+St1=nβ2(say), 듼 듼 듼ull=uSz2Sz2+St2=uβ3(say),usl=uSt2Sz2+St2=uβ4(say), 듼 듼 듼mll=mSz1Sz1+St1=mβ1(say), 듼 듼 듼and 듼 듼 듼msl=mSt1Sz1+St1=mβ2(say).

Using the assumptions of both the approaches the minimum variance of proposed estimators has been obtained and are presented in Table 2. Where,

Jj1=Kj0Kj3f-Kj0Kj4f2, 듼 듼 듼Jj2=Kj1Kj3-f(Kj0Kj4f-Kj0Kj2-Kj0Kj3+Kj1Kj4),Jj3=Kj1Kj2+Kj1Kj3-fKj1Kj4, 듼 듼 듼Jj4=f(Kj0+Kj4)-Kj3, 듼 듼 듼Jj5=Kj2+Kj3-Kj1-f(Kj0+Kj4),Ij1=Jj1Jj5+Jj2Jj4, 듼 듼 듼Ij2=Jj1Kj1-Jj4Jj3, 듼 듼 듼Ij3=Jj2Kj1+Jj3Jj5,Kj0=Sz22+St22;j=1,2,,6; 듼 듼 듼Kj1=2Sz22+2St22;j=1,3,5; 듼 듼 듼Kj1=Sz22β3+St22β4;j=2,4,6;K12=2(kt2Sz12-2ktSz1Sz2ρz1z2+Sz22+kt2St12-2ktSt1St2ρt1t2+St22),K13=2(2ktSz1Sz2ρz1z2-kt2Sz12+2ktSt1St2ρt1t2-kt2St12), 듼 듼 듼K14=Sz22+St22,K22=kp2Sz12-2kpSz1Sz2ρz1z2+Sz22β1+kp2St12-2kpSt1St2ρt1t2+St22β2,K23=2kpSz1Sz2ρz1z2-kp2Sz12β1+2kpSt1St2ρt1t2-kp2St12β2, 듼 듼 듼K24=Sz22+St22,kt=Sz1Sz2ρz1z2+St1St2ρt1t2Sz12+St12, 듼 듼 듼kp=(Sz1Sz2ρz1z2β1+St1St2ρt1t2β2)/(Sz12β1+St12β2),K32=2(Sz1z22Sz24Sz12-2Sz1z2Sz22Sz1Sz2ρz1z2+Sz1z22Sz24St12-2Sz1z2Sz22St1St2ρt1t2+Sz22+St22),K33=2(2Sz1Sz2ρz1z2Sz1z2Sz22-Sz12Sz1z22Sz24+2St1St2ρt1t2Sz1z2Sz22-St12Sz1z22Sz24), 듼 듼 듼K34=Sz22+St22,K42=(Sz12Sz1z22Sz24-2Sz1Sz2ρz1z2Sz1z2Sz22+Sz22)1β1+(St12Sz1z22Sz24-2Sz1z2Sz22St1St2ρt1t2+St22)1β2,K43=(2Sz1Sz2ρz1z2Sz1z2Sz22-Sz12Sz1z22Sz24)1β1+(2St1St2ρt1t2Sz1z2Sz22-St12Sz1z22Sz24)1β2,K44=Sz22+St22, 듼 듼 듼K52=2(Sz12G2t2+2ρz1z2Sz1Sz2G2t+St12G2t2+2St1St2ρt1t2G2t+Sz22+St22),K53=2[Sz12G3t2+2ρz1z2Sz1Sz2G3t+2G2tG3tSz12+St12G3t2+2G3tSt1St2ρt1t2+2G2tG3tSt12],K54=[Sz12G2t2+2ρz1z2Sz1Sz2G2t+St12G2t2+2St1St2ρt1t2G2t+Sz22+St22+Sz12G3t2+2ρz1z2Sz1Sz2G3t+2G2tG3tSz12+St12G3t2+2G3tSt1St2ρt1t2+2G2tG3tSt12],K62=1β1(Sz12G2p2+2ρz1z2Sz1Sz2G2p+Sz22)+1β2(St12G2p2+2St1St2ρt1t2G2p+St22),K63=1β1(Sz12G3p2+2ρz1z2Sz1Sz2G3p+2G2pG3pSz12)+1β2(St12G3p2+2G3pSt1St2ρt1t2+2G2pG3St12),K64=[Sz12G2p2+2ρz1z2Sz1Sz2G2p+St12G2p2+2St1St2ρt1t2G2p+Sz22+St22+Sz12G3p2+2ρz1z2Sz1Sz2G3p+2G2pG3pSz12+St12G3p2+2G3pSt1St2ρt1t2+2G2pG3pSt12],G2t=-Sz1Sz2ρz1z2+St1St2ρt1t2Sz12+St12, 듼 듼 듼G3t=-Sz1Sz2ρz1z2+Sz12G2t+St1St2ρt1t2+St12G2tSz12+St12,G2p=-(Sz1Sz2ρz1z2β1+St1St2ρt1t2β2)/(Sz12β1+St12β2),G3p=-(Sz1Sz2ρz1z2+Sz12G2pβ1+St1St2ρt1t2+St12G2pβ2)/(Sz12β1+St12β2), 듼 듼 듼and 듼 듼 듼f=nN.
8. Efficiency comparison

In order to compare various proposed IST estimators in successive sampling, the percent relative efficiencies have been computed for data considered in section 9 under Trappmann et al. (2014) as well as Perri et al. (2018) allocation designs as:

E1=Vt(T1)minVp(T1)min×100, 듼 듼 듼E2=Vt(T2)minVp(T2)min×100, 듼 듼 듼and 듼 듼 듼E3=Vt(T3)minVp(T3)min×100.
9. Numerical demonstration

Population Source: [Free access to data by Statistical Abstracts of United States]

To evaluate the performance of proposed IST successive sampling estimators, numerical illustrations has been supplemented using natural population. The population consists of N = 51 states. Let the aim be to estimate rate of abortion in year 2004. Therefore, for IST successive sampling frame work

we consider:

y=Rate of abortions in the year 2004x=Rate of abortions in the year 2000t1=Number of residents in the year 2000t2=Number of residents in the year 2004

Clearly the rate of abortion is sensitive; however, rate of residents is non-sensitive. Hence, the data is suitable to be applied for IST frame work. Since same study variable “rate of abortion” has been observed for two different years 2000 and 2004, therefore, the considered data is suitable for IST successive sampling frame work. The numerical calculations have been performed on the data with results represented in Table 3.

The value of Ei (i = 1, 2) are observed to be more than 100, this indicates that optimum allocation design by Perri et al. (2018) is preferable over Trappmann et al. (2014) design. Therefore, the further numerical analysis has been done using Perri et al. (2018) allocation design.

10. Simulation study

An extensive simulation study has been done using Monte Carlo simulation for the data mentioned in Section 9. The 5,000 different Monte Carlo replications have been observed. The process is also repeated for a different combination of constants termed as sets. The variance/mean squared error of the proposed estimators , , and has been computed under Perri et al. (2018); in addition, the allocation design and are denoted by , , and respectively. The percent of relative efficiencies for the IST successive difference and regression estimators with respect to IST successive general class of estimator have been computed as:

Es1=Vp(T1)Vp(T3)×100 듼 듼 듼and 듼 듼 듼Es2=Vp(T2)Vp(T3)×100.

The procedure have been repeated for three different combinations of constants, termed as different sets given as:

I:n=24,u=04,m=20II:n=24,u=08,m=16III:n=24,u=10,m=14.

Figure 1 and Figure 2 summarizes the outcomes of the simulation results.

11. Direct method

Estimators that use IST are less efficient than estimators obtained using direct questioning. Hence, in order to identify the amount of loss we compare the proposed class of estimator with respect to corresponding direct method. The estimator under direct questioning method is proposed as


TD=χTuD+(1-χ)T3mD, 듼 듼 듼χ[0,1],

where


TuD=y¯u,T3mD=d(y¯m,x¯m,x¯n),

where follow similar regularity conditions as stated in Section 6. The, minimum mean squared error of the class of estimator is obtained as


V(TD)min=[-μ^D2Jd1+μ^DJd2+Jd3μ^D2Jd4-μ^DJd5+Kd1]

with


μ^D=min {Id2+Id22-Id1Id3Id1,Id2-Id22-Id1Id3Id1} 듼 듼 듼such that μ^D[0,1],

where,

Jd1=Kd1Kd3f2, 듼 듼 듼Jd2=f(Kd1Kd3-Kd1Kd2+fKd1Kd3), 듼 듼 듼Jd3=Kd1Kd2-fKd1Kd3,Jd4=f(Kd1+Kd3), 듼 듼 듼Jd5=Kd1-Kd2+f(Kd1+Kd3), 듼 듼 듼Id1=Jd1Jd5-Jd2Jd4,Id2=Jd3Jd4+Jd1Kd1, 듼 듼 듼Id3=Jd2Kd1+Jd3Jd5, 듼 듼 듼Kd1=Sy2, 듼 듼 듼Kd2=Sy2+SxSyD22+2Sy2D2ρyx,Kd3=Sy2+Sx2D22+2SySxρyxD2, 듼 듼 듼and 듼 듼 듼D2=-SySxρyxSx2.

Further the simulated ratio of the mean squared error of and have been computed by considering 5,000 different samples using Monte Carlo simulation study for different sets and results are presented in Figure 3 and Figure 4 respectively


Ratio=V(TD)Vp(T3).

Remark 1. The IST is used with an expectation to receive a true response as in IST directly; however, the response to sensitive questions are not being asked. Standard bias reduction methods like Jackknife, the Bootstrap and methods that use approximations of the bias function through asymptotic expansions of the bias that follow Kosmidis (2014) may be used if some false responses are still received that show bias in the estimator after explaining the IST efficiently to respondents.

12. Interpretations of results

The following interpretations can be drawn from empirical and simulation results:

  • It has been observed that IST is feasible in successive sampling to handle sensitive issues on successive occasions.

  • From Table 3, it is clear that E1 and E2 both are coming out to be greater than 100, this implies that optimum allocation design by Perri et al. (2018) is more efficient than allocation by Trappmann et al. (2014) design in two occasion successive sampling. It is to be noted that for the considered data the optimum fraction to be drawn afresh do not exist for IST successive general class of estimator, so corresponding efficiency cannot be computed. Hence, in order to check the validity of IST successive general class of estimator simulation has been carried out with several choices of parameters.

  • Simulation results in Figure 1 and Figure 2, justify that Es1 and Es2 are greater than 100 for all three considered sets. This indicates that IST successive general class of estimators is more efficient than IST successive regression and IST, a successive difference estimators. However, Es1 > Es2 indicate that the IST successive difference estimator is better than the IST successive regression estimator. Also, the simulated percent relative efficiency increases as φ increases and is in accordance with the theory of successive sampling.

  • From Figure 3 and Figure 4, it is observed that as φ increases the simulated ratio of the mean squared error of the direct method and IST successive general class of estimator increases. The values indicate a loss in precision of the IST general class of estimator over the direct method. The issues under consideration are sensitive; therefore, a direct method is unsuitable because the privacy of respondents need to be considered; therefore, despite of loss in precision IST need to be preferred over direct method for sensitive issues on successive occasions.

13. Conclusion

The IST on successive occasions enables an estimation of the population mean of stigmatized quantitative variable using innocuous information, that reduce the social desirability response bias and provides privacy. Out of the three proposed IST successive estimators, the IST successive general class of estimator under both Trappmann et al. (2014) allocation designs as well as Perri et al. (2018) allocation designs have been proven to be more efficient than the other two. The optimum allocation of LL and SL samples by Perri et al. (2018) have been found to be more fruitful in successive sampling than allocation due to Trappmann et al. (2014). While comparing with direct method, certain amount of loss in precision is observed but that is realistic as the survey issues are sensitive so there may be chances of complete refusal or partial refusal if we apply direct method. However, using IST on successive occasion atleast estimation of sensitive issues are possible. Therefore it can be concluded that the proposed IST successive estimators provide comfort and satisfaction to the respondents in terms of privacy protection as well as a methodological advancement in literature related to successive sampling dealing with sensitive issues. Hence, the IST successive class of estimators may be recommended for practical use by survey practitioners.

Acknowledgements

The authors are thankful to the reviewers and editors for their valuable suggestions that improved an earlier version of the paper. The authors are also thankful to SERB, New Delhi, India for providing the financial assistance to carry out the present work. The authors also sincerely acknowledge free access to data from the Statistical Abstracts of United States.

Figures
Fig. 1. Simulated percent relative efficiency of the proposed IST general class of estimator with respect to proposed IST difference estimator for three different sets.
Fig. 2. Simulated percent relative efficiency of the proposed IST general class of estimator with respect to proposed IST regression estimator for three different sets.
Fig. 3. Ratio of mean squared error of (under optimum allocation design) with respect to direct method under IST in two occasion successive sampling for Set-I.
Fig. 4. Ratio of mean squared error of (under optimum allocation design) with respect to direct method under IST in two occasion successive sampling for Set-II.
TABLES

Table 1

Response received under item sum technique (IST)

Occasion Sample size Response received IST estimate
I n z1i={xi+t1i,if isnll,t1i,if isnsl, x¯^n=z¯1nll-t¯1nsl
m z1i={xi+t1i,if ismll,t1i,if ismsl, x¯^m=z¯1mll-t¯1msl
II m z2i={yi+t2i,if ismll,t2i,if ismsl, y¯^m=z¯2mll-t¯2msl
u z2i={yi+t2i,if isull,t2i,if isusl, y¯^u=z¯2ull-t¯2usl

zji; j = 1, 2 denote the observed response at first and second occasion respectively on the ith observation.

zjill; j = 1, 2; i ∈ {n,m, u} denote the mean of zj in the long list (LL) samples.

tjisl; j = 1, 2; i ∈ {n,m, u} denote the mean of t1 and t2 in the short list (SL) samples.


Table 2

Minimum variance

i Est. Minimum Variance under Trappmann et al. (2014) approach
1 Vt(T1)min=[μ^1t2J11-μ^1tJ12+J13μ^1t2J14+μ^1tJ15+K11] with, μ^1t=min {-I12+I122+I11I13I11,-I12-I122+I11I13I11}[0,1]
2 Vt(T2)min=[μ^2t2J31-μ^2tJ32+J33μ^2t2J34+μ^2tJ35+K31] with, μ^2t=min {-I32+I322+I31I33I31,-I32-I322+I31I33I31}[0,1]
3 Vt(T3)min=[μ^3t2J51-μ^3tJ52+J53μ^3t2J54+μ^3tJ55+K51] with, μ^3t=min {-I52+I522+I51I53I51,-I52-I522+I51I53I51}[0,1]

i Est. Minimum Variance under Perri et al. (2018) approach

1 Vp(T1)min=[μ^1p2J21-μ^1pJ22+J23μ^1p2J24+μ^1pJ25+K21] with, μ^1p=min {-I22+I222+I21I23I21,-I22-I222+I21I23I21}[0,1]
2 Vp(T2)min=[μ^2p2J41-μ^2pJ42+J43μ^2p2J44+μ^2pJ45+K41] with, μ^2p=min {-I42+I422+I41I43I41,-I42-I422+I41I43I41}[0,1]
3 Vp(T3)min=[μ^3p2J61-μ^3pJ62+J63μ^3p2J64+μ^3pJ65+K61] with, μ^3p=min {-I62+I622+I61I63I61,-I62-I622+I61I63I61}[0,1]

Table 3

Empirical results

μ1p μ1t μ2p μ2t μ3p μ3t E1 E2 E3
0.8424 0.8359 0.7617 0.7626 * * 114.9334 115.8558 -

‘*’ = indicates that the optimum value of fraction of sample to be drawn afresh do not exist; ‘-’ = denote corresponding percent relative efficiency cannot be computed.


References
  1. Arnab R and Singh S (2013). Estimation of mean of sensitive characteristics for successive sampling. Communications in Statistics - Theory and Methods, 42, 2499-2524.
    CrossRef
  2. Chaudhuri A and Christofides TC (2013). Indirect Questioning in Sample Surveys, Berlin, Heidelberg, De, Springer-Verlag.
    CrossRef
  3. Hussian Z, Shabbir N, and Shabbir J (2015). An alternative item sum technique for improved estimators of population mean in sensitive surveys. Hacettepe University Bulletin of Natural Sciences and Engineering Series B: Mathematics and Statistics, 46, 1-30.
  4. Jessen RJ (1942). Statistical investigation of a sample survey for obtaining farm facts. Iowa Agriculture and Home Economics Experiment Station. Research Bulletin, 26, 1-104.
  5. Kosmidis I (2014). Bias in parametric estimation: reduction and useful side-effects. WIREs Computational Statistics, 6, 185-196.
    CrossRef
  6. Miller JD (1984). A new survey technique for studying deviant behavior (PhD thesis) , The George Washington University, Washington DC.
  7. Naeem N and Shabbir J (2016). Use of scrambled responses on two occasions successive sampling under non-response Hacettepe University Bulletin of Natural Sciences and Engineering Series B: Mathematics and Statistics 46.
  8. Perri PF, Rueda Garcia M, and Cobo Rodriguez B (2018). Multiple sensitive estimation and optimal sample size allocation in the item sum technique. Biometrical Journal, 60, 155-173.
    Pubmed CrossRef
  9. Priyanka K and Trisandhya P (2018). A composite class of estimators using scrambled response mechanism for sensitive population mean in successive sampling. Communications in Statistics - Theory and Methods.
    CrossRef
  10. Priyanka K, Trisandhya P, and Mittal R (2018). Dealing sensitive characters on successive occasions through a general class of estimators using scrambled response techniques. Metron, 76, 203-230.
    CrossRef
  11. Rueda Garcia M, Perri PF, and Cobo Rodriguez B (2017). Advances in estimation by the item sum technique using Auxiliary information in complex surveys. Advances in Statistical Analysis, 102, 455-478.
    CrossRef
  12. Singh GN, Suman S, Khetan M, and Paul C (2017). Some estimation procedures of sensitive character using scrambled response techniques in successive sampling. Communications in Statistics - Theory and Methods.
    CrossRef
  13. Srivastava SK and Jhajj HS (1980). A class of estimators using auxiliary information for estimating finite population variance. Sankhya, C42, 87-96.
  14. Tracy DS, Singh HP, and Singh R (1996). An alternative to the ratio-cum-product estimator in sample surveys. Journal of Statistical Planning and Inference, 53, 375-397.
    CrossRef
  15. Trappmann M, Krumpal I, Kirchner A, and Jann B (2014). Item sum: a new technique for asking quantative sensitive questions. Journal of Survey Statistics and Methodology, 2, 58-77.
    CrossRef
  16. Tian GL and Tang ML (2014). Incomplete Categorical Data Design: Non-Randomized Response Techniques for Sensitive Questions in Surveys, FL, Chapman & Hall/CRC.
  17. Warner SL (1965). Randomized response: a survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60, 63-69.
    Pubmed CrossRef
  18. Yu B, Jin Z, Tian J, and Gao G (2015). Estimation of sensitive proportion by randomized response data in successive sampling. Computational and Mathematical Methods in Medicine, 2015, 1-6.
    Pubmed KoreaMed CrossRef