
In statistics, estimation is the method of estimating an unknown population parameter utilising sample data. Determining the population mean using a sample of data is a frequent example of this. A single value called a point estimator, which is used for estimating the population parameter, is one method of estimation. Unreliable or biassed estimates may result from point estimators’ sensitivity to outliers or other odd observations in the data.
A ratio estimator, that employs the ratio of two variables that are present in the sample data to determine the ratio of the associated population parameters, is one approach to overcoming this issue. Whenever the correlation coefficient between the variable being studied and an auxiliary variable is positive, it indicates that the two variables typically vary together, and this relationship may be used to increase estimate accuracy. By modifying the estimate of the study variable with the auxiliary variable, a more precise and effective estimate of the population mean may be achieved.
In order to improve the performance of the ratio estimator even further, information about the auxiliary variable may be integrated into the estimating process. For simplicity, the estimator may be modified based on the information about the auxiliary variable’s unpredictability in relation to the mean considering the coefficient of variation, and the estimator could be modified based on the auxiliary variable’s distributional shape considering the coefficient of kurtosis. Now, the ratio estimator may be tailored to the unique properties of the data by taking into consideration these other variables, which can result in even more precise and effective estimations of the unknown population parameter. Overall, ratio estimators are considered an excellent method for determining population characteristics from sample observations, and they may be particularly useful when there is a strong positive correlation between the study and auxiliary variables. Notably, it’s important to exercise precautions when utilising any estimating technique and to take into account any potential restrictions and underlying presumptions that might be present.
The potential benefits of incorporating data on the auxiliary variable to improve the performance of ratio estimators have been discovered by several statisticians.
A number of studies made use of this approach, for example Kadilar and Cingi (2004), Kadilar
The
The tuning constant
The formula
with respect to
Huber (1981) developed the
To study more about robust regression see quantreg (Koenker, 2009) package in R-Software (2021), Zaman and Bulut (2019, 2021), Zaman
In Section 2, we consider the existing estimators using robust regression. In Section 3, we discuss new ratio type exponential estimators based on the Huber
Kadilar
where
Using a first-degree-approximation expansion, the MSEs of the estimators (1)–(5) can be calculated as follows:
where
In recent years, the use of robust statistical approaches in sampling studies and finite population mean estimates has received a lot of attention. The necessity to improve the effectiveness and accuracy of predicting the population mean in the context of outliers served as the primary motivation for establishing the development and study of the ratio type exponential estimators discussed in this article. Sample surveys are an important method for deriving conclusions regarding finite populations. However, when outliers are present in the data, which might have a disproportionate impact on traditional estimators, their dependability may be compromised. Robust estimating strategies are now being investigated as a result of this constraint.
The suggested estimators are based on the Huber
In this paper, we formulate mathematical equations for the bias and the mean squared error of the suggested estimators, which enable a thorough evaluation of their performance. We contrast these estimators’ performance with that of those developed by Kadilar, Candan, and Cingi (Hacet J Math Stat, 36, 181–188, 2007) in order to assess their efficacy. To determine the scenarios in which the recommended estimators perform better than current techniques, a comparison study is necessary. Furthermore, real-world application is essential, thus we undertake numerical and simulation-based investigations employing five population datasets, each of which contains an outlier. These empirical studies support our theoretical conclusions and offer perceptions on how the suggested estimators actually work in real-world situations.
We presented ratio type exponential estimators employing the Huber (1981)
To calculate the mean square error (MSE) of the suggested estimators
We would like to point out that the population data are used to calculate
where Φ
Extending the right side of “(
Squaring “(
Taking the expectation of both sides of the equations “(
In this section, the efficiency criteria for proposed estimators
According to the
The considered estimators are compared to the existing estimators in the literature in this section. To compare the behaviour of the suggested estimators to other existing estimators, five different types of natural population data sets (shown in Table 1) were used. Since real population data sets include outliers, we take them all into account.
We visually detected outliers in the datasets A, B, C, D, E and displayed them in Figure 1. These data points are considered outliers because they considerably diverge from the overall trend of the dataset, which suggests that they may have an impact on parameter estimation. We used the Huber
Outlier description: We uses scatter plots to visually recognise outliers in this dataset, concentrating on cases where the weight of sodium and the weight of calories differed significantly from the majority of data points. Outliers in this dataset are mostly high-sodium and high-calorie cereals.
Outlier description: Using scatter plots, outliers in this dataset were identified, revealing multiple points of data with substantially higher values for non-real estate farm loans as well as real estate farm loans in 1977.
Outlier description: This dataset’s outliers were identified using scatter plots, which showed situations when the cultivated area under wheat in 1964 diverged significantly from the normal range.
Outlier description: In this dataset, we utilised scatter plots to visually evaluate data points where fixed capital and output data for factories in an area had unusual patterns, and then we used those data points to identify outliers.
Outlier description: In order to visually identify outliers, scatter plots were used, concentrating on families with unusually high yearly incomes or annual food expenditures in Belgian francs.
We went into discussion about how these visually distinguished outliers affected our study and the estimating techniques we used. We also want to highlight that we used the Huber
In Tables 2
This result highlights a key finding: Our suggested estimators frequently beat their competitors in terms of mean squared error. This result occurs when the percent relative efficiencies surpass 100. This is an excellent illustration of how well our estimators perform when applied to the simulated datasets in terms of prediction accuracy and precision. These findings highlight the usefulness in real-world applications and enhanced efficiency of our suggested estimators, highlighting their potential as useful tools for statistical estimation tasks.
To find the RE(%)of the suggested estimators, we will conduct a simulation study that is carried out by considering the “Engel Data Set” (Koenker and Bassett, 1982) presented in Table 1. This data set contains data on income and food expenditure for 235 working-class Belgian households. To load this data, load the quantreg library, and then enter the command data (engel) in R programming.
We carry out the procedures listed below for carrying out the simulation study, which were coded in R-program (2021), and we describe the simulation processes taken into account to determine the MSEs of the suggested estimators
where
The MSE ratio of the investigated estimators to the current estimators for each sample size (
It is important to highlight that the suggested estimators’ efficiency significantly increases when compared to the current estimators. To put it another way, the suggested estimators show considerably higher efficiency in situations where outliers were more likely to occur in the data. The Tables 3
From the Tables 1
We present descriptions of five real-world data sets in the Table 1 to demonstrate the applications of our research.
From the Table 2
For Data Set-A, the PREs of the estimators
For data set B, with Population size 50 and sample size 20, the PREs of the estimators
For data set C, the PREs of the estimators
For data set D, with Population size 80 and sample size 20, the PREs of the estimators
For data set E, the PREs of the estimators
From the Table 3
For sample size of 20 and the population size of 5000, the PREs of the suggested estimators
For sample size of 30 and the population size of 5000, the PREs of the suggested estimators
For sample size of 40 and the population size of 5000, the PREs of the suggested estimators
For sample size of 50 and the population size of 5000, the PREs of the suggested estimators
We introduced and compared our estimators, designated as
In the presence of outliers, using traditional statistical approaches for data analysis might lead to incorrect outcomes. Robust regression approaches have been used to enhance methods for predicting the population mean in order to solve this problem. This article presents an innovative approach for analysing sample survey data, concentrating on the development of exponential estimators of the ratio type using the Huber
All the relevant data information is avaiable within the manuscript and code is given in
The authors declare no conflict of interest.
Authors are thankful to the Editor-in-Chief and learned referees for their inspiring and fruitful suggestions. The authors are also thankful to the National Institute of Technology Arunachal Pradesh, Jote, for providing the necessary infrastructure for the completion of the present work.
# Load necessary libraries
library(quantreg)
# Load the ‘engel’ dataset
data(engel)
# Define the variable (food expenditure) and (income)
Y <- engel$foodexp
X <- engel$income
# Calculate the means variables
Ybar <- mean(Y)
Xbar <- mean(X)
# Specify the number of bootstrap replications (5000, 10000, 100000)
B <- 100000
# Set the sample size ‘n’ (20, 30, 40, 50)
n <- 50
# Define ‘N’ as the length of the ’income’ vector
N <- length(engel$income)
# Initialize vectors to store results for different estimators
T1.1 <- numeric(B)
T1.2 <- numeric(B)
T1.3 <- numeric(B)
T1.4 <- numeric(B)
T1.5 <- numeric(B)
P1.1 <- numeric(B)
P1.2 <- numeric(B)
P1.3 <- numeric(B)
P1.4 <- numeric(B)
P1.5 <- numeric(B)
# Loop for bootstrap replications
for (K in 1:B) {
# Randomly sample data without replacement to create a bootstrap sample
swor <- sample(N, size = n, replace = FALSE)
y <- engel$foodexp[swor]
x <- engel$income[swor]
ybar <- mean(y)
xbar <- mean(x)
Cx <- sd(x) / mean(x)
B2 <- kurtosis(x)
library(MASS)
Br1 <- rlm(y ~ x)
Br <- Br1$coefficients[2]
# Calculate various estimators
T1.1[K] <- ((ybar + Br * (Xbar - xbar)) / xbar) * Xbar
T1.2[K] <- ((ybar + Br * (Xbar - xbar)) / (xbar + Cx)) * (Xbar + Cx)
T1.3[K] <- ((ybar + Br * (Xbar - xbar)) / (xbar + B2)) * (Xbar + B2)
T1.4[K] <- ((ybar + Br * (Xbar - xbar)) / (xbar * B2 + Cx)) * (Xbar * B2 + Cx)
T1.5[K] <- ((ybar + Br * (Xbar - xbar)) / (xbar * Cx + B2)) * (Xbar * Cx + B2)
P1.1[K] <- (ybar + Br * (Xbar - xbar)) * exp((Xbar - xbar) / (Xbar + xbar))
P1.2[K] <- (ybar + Br * (Xbar - xbar)) * exp((Xbar - xbar) / ((Xbar + xbar)
+ 2 * Cx))
P1.3[K] <- (ybar + Br * (Xbar - xbar)) * exp((Xbar - xbar) / ((Xbar + xbar)
+ 2 * B2))
P1.4[K] <- (ybar + Br * (Xbar - xbar)) * exp((B2 * (Xbar - xbar)) / (B2 * (Xbar
+ xbar) + 2 * Cx))
P1.5[K] <- (ybar + Br * (Xbar - xbar)) * exp((Cx * (Xbar - xbar)) / (Cx * (Xbar
+ xbar) + 2 * B2))
}
# Calculate Mean Squared Errors (MSE) for each estimator
MSEY1 <- mean((T1.1 - mean(T1.1))ˆ2)
MSEY2 <- mean((T1.2 - mean(T1.2))ˆ2)
MSEY3 <- mean((T1.3 - mean(T1.3))ˆ2)
MSEY4 <- mean((T1.4 - mean(T1.4))ˆ2)
MSEY5 <- mean((T1.5 - mean(T1.5))ˆ2)
d <- data.frame(MSEY1, MSEY2, MSEY3, MSEY4, MSEY5)
MSEYpr1 <- mean((P1.1 - mean(P1.1))ˆ2)
MSEYpr2 <- mean((P1.2 - mean(P1.2))ˆ2)
MSEYpr3 <- mean((P1.3 - mean(P1.3))ˆ2)
MSEYpr4 <- mean((P1.4 - mean(P1.4))ˆ2)
MSEYpr5 <- mean((P1.5 - mean(P1.5))ˆ2)
d1 <- data.frame(MSEYpr1, MSEYpr2, MSEYpr3, MSEYpr4, MSEYpr5)
# Relative Efficiency
da1 <- c(MSEY1 / MSEYpr1, MSEY2 / MSEYpr1, MSEY3 / MSEYpr1, MSEY4 / MSEYpr1, MSEY5
/ MSEYpr1)
da2 <- c(MSEY1 / MSEYpr2, MSEY2 / MSEYpr2, MSEY3 / MSEYpr2, MSEY4 / MSEYpr2, MSEY5
/ MSEYpr2)
da3 <- c(MSEY1 / MSEYpr3, MSEY2 / MSEYpr3, MSEY3 / MSEYpr3, MSEY4 / MSEYpr3, MSEY5
/ MSEYpr3)
da4 <- c(MSEY1 / MSEYpr4, MSEY2 / MSEYpr4, MSEY3 / MSEYpr4, MSEY4 / MSEYpr4, MSEY5
/ MSEYpr4)
da5 <- c(MSEY1 / MSEYpr5, MSEY2 / MSEYpr5, MSEY3 / MSEYpr5, MSEY4 / MSEYpr5, MSEY5
/ MSEYpr5)
da <- c(da1, da2, da3, da4, da5)
RE_matrix <- matrix(da, ncol = 5, nrow = 5, byrow = T) * 100
# Display or export results
d # MSE results for T1 estimators
d1 # MSE results for P1 estimators
RE_matrix # Relative Efficiency matrix
Parameters of five natural population data sets
A UScereals (Ripley | B Singh (pp: 1111) (2003) | C Murthy (pp: 399) (1967) | D Murthy (pp: 288) (1967) | E Engel (Koenker and Bassett, 1982) |
---|---|---|---|---|
Percent relative efficiencies of the suggested estimators
Data sets | Estimators | |||||
---|---|---|---|---|---|---|
212.7720 | 212.0549 | 202.5874 | 212.6842 | 194.9832 | ||
213.1139 | 212.3956 | 202.9129 | 213.0259 | 195.2965 | ||
217.7221 | 216.9884 | 207.3006 | 217.6323 | 199.5195 | ||
212.8138 | 212.0966 | 202.6272 | 212.7260 | 195.0216 | ||
221.5548 | 220.8081 | 210.9498 | 221.4634 | 203.0318 | ||
250.9038 | 250.3324 | 248.7804 | 250.7798 | 249.1821 | ||
251.2503 | 250.6781 | 249.1240 | 251.1261 | 249.5262 | ||
252.1963 | 251.6219 | 250.0620 | 252.0717 | 250.4657 | ||
250.9788 | 250.4073 | 248.8549 | 250.8549 | 249.2567 | ||
251.9507 | 251.3770 | 249.8185 | 251.8263 | 250.2219 | ||
370.1677 | 367.6736 | 360.2442 | 369.3084 | 356.5053 | ||
372.4287 | 369.9194 | 362.4446 | 371.5642 | 358.6828 | ||
379.3293 | 376.7735 | 369.1601 | 378.4487 | 365.3287 | ||
370.9437 | 368.4444 | 360.9993 | 370.0825 | 357.2526 | ||
382.8989 | 380.3190 | 372.6340 | 382.0100 | 368.7665 | ||
379.8456 | 379.3468 | 377.9464 | 379.6715 | 377.3188 | ||
380.3142 | 379.8148 | 378.4127 | 380.1399 | 377.7843 | ||
381.6361 | 381.1350 | 379.7280 | 381.4612 | 379.0975 | ||
380.0090 | 379.5100 | 378.1091 | 379.8348 | 377.4812 | ||
382.2315 | 381.7296 | 380.3204 | 382.0563 | 379.6889 | ||
281.8757 | 281.6214 | 273.6011 | 281.8612 | 266.5705 | ||
282.0685 | 281.8140 | 273.7882 | 282.0541 | 266.7529 | ||
288.2916 | 288.0315 | 279.8286 | 288.2768 | 272.6380 | ||
281.8866 | 281.6323 | 273.6117 | 281.8722 | 266.5809 | ||
293.9832 | 293.7180 | 285.3532 | 293.9681 | 278.0206 |
Relative efficiencies (%) of the considered estimators
Sample Sizes | Estimators | |||||
---|---|---|---|---|---|---|
287.4186 | 287.1696 | 284.8613 | 287.3576 | 282.4920 | ||
287.6115 | 287.3623 | 285.0525 | 287.5505 | 282.6816 | ||
289.4916 | 289.2408 | 286.9159 | 289.4302 | 284.5295 | ||
287.4619 | 287.2128 | 284.9042 | 287.4009 | 282.5346 | ||
291.1673 | 290.9150 | 288.5766 | 291.1055 | 286.1765 | ||
281.6029 | 281.3562 | 278.3832 | 281.5545 | 275.7349 | ||
281.7920 | 281.5452 | 278.5702 | 281.7435 | 275.9200 | ||
284.2157 | 283.9668 | 280.9662 | 284.1669 | 278.2933 | ||
281.6370 | 281.3903 | 278.4170 | 281.5886 | 275.7683 | ||
286.1736 | 285.9230 | 282.9017 | 286.1244 | 280.2103 | ||
280.9477 | 280.7004 | 277.2045 | 280.9056 | 274.2236 | ||
281.1367 | 280.8892 | 277.3909 | 281.0946 | 274.4081 | ||
283.9790 | 283.7289 | 280.1953 | 283.9364 | 277.1823 | ||
280.9777 | 280.7303 | 277.2340 | 280.9356 | 274.2529 | ||
286.2625 | 286.0104 | 282.4484 | 286.2196 | 279.4112 | ||
284.7560 | 284.5029 | 280.5511 | 284.7162 | 277.1985 | ||
284.9496 | 284.6964 | 280.7418 | 284.9098 | 277.3870 | ||
288.1667 | 287.9106 | 283.9114 | 288.1265 | 280.5187 | ||
284.7844 | 284.5314 | 280.5791 | 284.7447 | 277.2262 | ||
290.7960 | 290.5376 | 286.5019 | 290.7554 | 283.0782 |
Relative efficiencies (%) of the considered estimators
Sample Sizes | Estimators | |||||
---|---|---|---|---|---|---|
283.7811 | 283.5358 | 281.2569 | 283.7217 | 278.9507 | ||
283.9702 | 283.7247 | 281.4443 | 283.9108 | 279.1366 | ||
285.8136 | 285.5665 | 283.2714 | 285.7538 | 280.9487 | ||
283.8230 | 283.5776 | 281.2984 | 283.7636 | 278.9919 | ||
287.4332 | 287.1847 | 284.8765 | 287.3731 | 282.5407 | ||
281.8515 | 281.6043 | 278.6550 | 281.8028 | 276.0324 | ||
282.0411 | 281.7937 | 278.8425 | 281.9923 | 276.2181 | ||
284.4471 | 284.1975 | 281.2211 | 284.3978 | 278.5743 | ||
281.8860 | 281.6387 | 278.6891 | 281.8372 | 276.0661 | ||
286.3941 | 286.1429 | 283.1461 | 286.3446 | 280.4812 | ||
282.4314 | 282.1826 | 278.6957 | 282.3882 | 275.6903 | ||
282.6213 | 282.3723 | 278.8831 | 282.5781 | 275.8757 | ||
285.4612 | 285.2097 | 281.6854 | 285.4175 | 278.6478 | ||
282.4620 | 282.2132 | 278.7259 | 282.4189 | 275.7203 | ||
287.7615 | 287.5080 | 283.9553 | 287.7175 | 280.8932 | ||
284.7022 | 284.4496 | 280.5371 | 284.6623 | 277.2014 | ||
284.8955 | 284.6427 | 280.7276 | 284.8556 | 277.3896 | ||
288.0771 | 287.8214 | 283.8626 | 288.0367 | 280.4873 | ||
284.7308 | 284.4782 | 280.5653 | 284.6909 | 277.2293 | ||
290.6930 | 290.4350 | 286.4403 | 290.6522 | 283.0343 |
Relative efficiencies (%) of the considered estimators
Sample Sizes | Estimators | |||||
---|---|---|---|---|---|---|
283.8537 | 283.6076 | 281.3037 | 283.7942 | 278.9635 | ||
284.0422 | 283.7959 | 281.4905 | 283.9826 | 279.1487 | ||
285.8925 | 285.6447 | 283.3242 | 285.8326 | 280.9672 | ||
283.8955 | 283.6494 | 281.3451 | 283.8360 | 279.0046 | ||
287.5312 | 287.2819 | 284.9482 | 287.4709 | 282.5776 | ||
283.6235 | 283.3746 | 280.3902 | 283.5742 | 277.6850 | ||
283.8144 | 283.5653 | 280.5789 | 283.7651 | 277.8719 | ||
286.2491 | 285.9979 | 282.9859 | 286.1994 | 280.2556 | ||
283.6584 | 283.4095 | 280.4247 | 283.6091 | 277.7192 | ||
288.2599 | 288.0069 | 284.9737 | 288.2098 | 282.2243 | ||
282.6203 | 282.3717 | 278.8912 | 282.5771 | 275.8742 | ||
282.8102 | 282.5614 | 279.0786 | 282.7670 | 276.0596 | ||
285.6450 | 285.3937 | 281.8760 | 285.6013 | 278.8267 | ||
282.6510 | 282.4024 | 278.9215 | 282.6078 | 275.9042 | ||
287.9584 | 287.7051 | 284.1589 | 287.9144 | 281.0849 | ||
282.9058 | 282.6569 | 278.8137 | 282.8660 | 275.5154 | ||
283.0956 | 282.8465 | 279.0007 | 283.0558 | 275.7003 | ||
286.2084 | 285.9566 | 282.0685 | 286.1682 | 278.7318 | ||
282.9342 | 282.6853 | 278.8417 | 282.8945 | 275.5432 | ||
288.7843 | 288.5303 | 284.6072 | 288.7438 | 281.2404 |