The gamma distribution is a flexible right-skewed distribution widely used in many areas, and it is of great interest to estimate the probability of a random variable exceeding a specified value in survival and reliability analysis. Therefore, the study develops a fixed-accuracy confidence interval for
As one of the most important distributions in probability and statistics, the gamma distribution is a flexible right-skewed distribution widely used in many areas such as reliability, environment, insurance and medicine. Burgin (1975) justified the applicability of the gamma distribution to inventory control. Vaz and Fortes (1988) discussed fitting a gamma distribution for grain sizes in a poly-crystal. Husak
Many researchers have worked on the estimation of the parameters in two-parameter gamma distributions. Choi and Wette (1969) examined the numerical technique of the maximum likelihood method to estimate both parameters of a gamma distribution. Chen and Mi (1998) discussed the point estimation for the scale parameter of a gamma distribution based on grouped data, assuming that the shape parameter was known. Iliopoulos (2016) constructed exact confidence intervals for the shape parameter of a gamma distribution. The author compared the exact confidence intervals with bootstrap confidence intervals via simulation studies. Son and Oh (2006) developed a Gibbs sampling Bayesian estimator of the two-parameter gamma distribution under the non-informative prior.
Most of the literature on the parameter estimation of gamma distributions is based on fixed-sample-size procedures. That is, one aims to find the estimate for parameters of interest based on the obtained data, no matter how large or small it is. In certain situations, especially when data collecting process is time-consuming or costly, it is of great importance to understand what sample size is needed to obtain the value of estimators with prescribed accuracy, which, however, depends on some unknown nuisance parameter and cannot be fixed in advance. Thus, sequential sampling becomes necessary to solve such problems. Takada and Nagata (1995) considered a sequential procedure for building a fixed-width confidence interval for the mean of a gamma distribution. Isogai and Uno (1995) developed a sequential procedure for estimating the mean of a gamma distribution under a loss function of squared error plus linear cost. Liu (2001) worked on approximating the optimal fixed sample size expected reward through a two-stage sampling procedure. Recently, Zacks and Khan (2011) developed two-stage and sequential procedures of conducting fixed-width confidence interval estimation for the scale parameter when the shape parameter is known. Mahmoudi and Roughani (2015) studied a bounded risk two-stage sampling procedure for estimating the scale parameter of a gamma distribution with the shape parameter known. Roughani and Mahmoudi (2015) derived explicit formulas for the expected value and risk of the estimator of the scale parameter in a gamma distribution, where the shape parameter was assumed known. One may achieve a broad-ranging review in the field of sequential analysis by combining selected parts of interest from the following monographs and references therein: Stein (1949), Anscombe (1952, 1953), Chow and Robbins (1965), Woodroofe (1977), Ghosh and Mukhopadhyay (1981), Siegmund (1985), Ghosh
In this paper, we will focus on constructing a fixed-accuracy confidence interval for
Let us begin with a gamma random variable (
Here,
The rest of the paper is organized as the following: In Section 2, we provide a purely sequential procedure to estimate
In the situation when
where
and
Starting with a random sample,
where
and
The central limit theorem (CLT) for MLE tells us that as
where
where “log” represents the natural logarithm, and
Note that by definition
so
And we have the expression of
To estimate
with
With a prescribed significance level 0 <
from which it follows that
where
The magnitude of
In the light of Anscombe (1953) and Chow and Robbins (1965) who established fundamental theory of sequential estimation, we propose a purely sequential procedure to deal with the problem of constructing a fixed-accuracy confidence interval for
Let
where
Let
Upon termination, with the acquired data
the 100(1 −
and accordingly, the confidence interval for
where
Now, we are in a position to discuss the appealing asymptotic efficiency and consistency properties for this newly proposed purely sequential procedure ℘_{1}.
We prove the asymptotic first-order efficiency and consistency by applying a general framework of purely sequential fixed-width confidence intervals based on MLE proposed in Yu (1989) with the stopping rule given by
where
Observe that a corresponding confidence interval can be constructed for log
which is symmetric about log
For this fixed-width confidence interval estimation problem, the optimal fixed sample size is still
To investigate the performance of the purely sequential fixed-accuracy confidence interval estimation procedures ℘_{1} based on the stopping rule given by (
We summarized the following quantities in Table 1 by running 10,000 independent trials: the mean and standard deviation of the terminated sample sizes,
In Table 1,
For a two-parameter gamma distribution Γ(
Let us define a new random variable
Then,
serves as an unbiased and consistent estimator of
Similarly, we define
so following the delta method, as
where
which additionally satisfies the condition that
in order to obtain a confidence interval with 100(1 −
Likewise, we derive an alternative optimal fixed sample size given by
which remains unknown, again. Therefore, it is essential to estimate
One can immediately identify a similarity between (
where
With the pilot sample data, if
Clearly,
Upon termination, with the acquired data
we propose the 100(1 −
and accordingly, the confidence interval for
where
Theorem 4 can be proved in the same fashion as we proved Theorem 2, as long as one notes that
Next, we include a simulation study by implementing the purely sequential fixed-accuracy confidence interval estimation procedures ℘_{2} based on the stopping rule given by (
Running 10,000 independent trials, we reported the mean and standard deviation of the terminated sample sizes
To demonstrate the practical applicability of our newly proposed fixed-accuracy confidence interval estimation procedures, we include illustrations using three real-life data sets: (i) the urine albumin-to-creatinine ratios (UACR, mg/g) of 5255 adolescent survey participants from NHANES 1999–2004, referred to as the “UACR data”; (ii) excess cycle times data in steel manufacturing; (iii) survival times data from a group of 97 female dementia patients diagnosed at age 70–74.
The reference population for our analysis is created using survey participants from NHANES 1999–2014 who met the following criteria: between 12 and 17 years old, not pregnant, blood pressure < 120/80 mmHg, without diabetes, no prescription medications used within the previous 30 days, and a Z-score for weight-to-height ratio ≤ 2. This yields a reference sample of size
For illustrative purposes, we also treated the UACR data of size
The excess cycle times data in steel manufacturing was first given in Example 6.1 from Barnett and Lewis (1994) and it is assumed to be a sample from an exponential population. Kimber (1982) and Lin and Balakrishnan (2009) declared that the observed value 92 is an outlier. For this illustration, we use the sample data without the data point 92, and set the shape parameter
To illustrate the nonparametric procedure as per (
According to the analysis by Xie
In the end, using the final observations from Table 5, we were able to get a 95% confidence interval estimation for
Survival and reliability analysis are two of the most important scientific fields where the gamma distribution is often used to model data. And in these two fields, it is crucial to understand when the measurement, the random variable that is modeled using a gamma distribution, goes beyond a “dangerous” value. Therefore, in the paper, we focus on estimating
Finally, it is also worth mentioning that the nonparametric sequential fixed-accuracy confidence interval estimation procedure developed in Section 3 can be further extended to estimate
Simulation results by implementing ℘_{1} as (2.16) using Γ(1, 2) with
1.50 | 58.48 | 59.76 | 6.39 | 1.0219 | 1.28 | 0.9525 | 0.0021 | 0.3695 | 0.0005 |
1.45 | 69.64 | 70.94 | 7.00 | 1.0187 | 1.30 | 0.9569 | 0.0020 | 0.3692 | 0.0004 |
1.40 | 84.92 | 86.21 | 7.74 | 1.0152 | 1.29 | 0.9524 | 0.0021 | 0.3691 | 0.0004 |
1.35 | 106.75 | 107.95 | 8.60 | 1.0113 | 1.20 | 0.9539 | 0.0021 | 0.3691 | 0.0003 |
1.30 | 139.66 | 140.82 | 9.78 | 1.0082 | 1.16 | 0.9558 | 0.0021 | 0.3690 | 0.0003 |
1.25 | 193.08 | 194.36 | 11.59 | 1.0066 | 1.28 | 0.9517 | 0.0021 | 0.3684 | 0.0003 |
1.20 | 289.21 | 290.28 | 14.11 | 1.0037 | 1.07 | 0.9536 | 0.0021 | 0.3685 | 0.0002 |
1.15 | 492.17 | 493.32 | 18.46 | 1.0023 | 1.15 | 0.9535 | 0.0021 | 0.3682 | 0.0002 |
1.10 | 1058.32 | 1059.17 | 26.94 | 1.0008 | 0.85 | 0.9511 | 0.0022 | 0.3682 | 0.0001 |
Simulation results by implementing ℘_{2} as (3.9) using Γ(2, 2) with
1.70 | 56.57 | 58.27 | 3.35 | 1.0301 | 1.70 | 0.9599 | 0.0020 | 0.4086 | 0.0006 |
1.65 | 63.52 | 65.24 | 3.40 | 1.0271 | 1.72 | 0.9557 | 0.0021 | 0.4083 | 0.0006 |
1.60 | 72.11 | 73.87 | 3.63 | 1.0245 | 1.76 | 0.9615 | 0.0019 | 0.4079 | 0.0006 |
1.55 | 82.93 | 84.69 | 3.82 | 1.0212 | 1.76 | 0.9551 | 0.0021 | 0.4075 | 0.0005 |
1.50 | 96.89 | 98.68 | 4.10 | 1.0184 | 1.79 | 0.9562 | 0.0020 | 0.4075 | 0.0005 |
1.45 | 115.38 | 117.19 | 4.38 | 1.0157 | 1.81 | 0.9491 | 0.0022 | 0.4075 | 0.0004 |
1.40 | 140.70 | 142.53 | 4.72 | 1.0131 | 1.83 | 0.9503 | 0.0022 | 0.4072 | 0.0004 |
1.35 | 176.86 | 178.66 | 5.35 | 1.0102 | 1.80 | 0.9540 | 0.0021 | 0.4070 | 0.0004 |
1.30 | 231.40 | 233.15 | 6.00 | 1.0075 | 1.75 | 0.9479 | 0.0022 | 0.4069 | 0.0003 |
1.25 | 319.90 | 321.69 | 6.96 | 1.0056 | 1.79 | 0.9526 | 0.0021 | 0.4068 | 0.0003 |
1.20 | 479.19 | 480.97 | 8.63 | 1.0037 | 1.78 | 0.9469 | 0.0022 | 0.4064 | 0.0002 |
1.15 | 815.46 | 817.23 | 11.03 | 1.0022 | 1.77 | 0.9521 | 0.0021 | 0.4063 | 0.0002 |
1.10 | 1753.49 | 1755.38 | 16.14 | 1.0011 | 1.89 | 0.9527 | 0.0021 | 0.4061 | 0.0001 |
Fixed-accuracy confidence intervals using the UACR data with
1.20 | 745 | (0.0993, 0.1429) | (0.0903, 0.1251) |
1.18 | 1045 | (0.0758, 0.1055) | (0.0705, 0.0955) |
1.16 | 1388 | (0.0707, 0.0951) | (0.0660, 0.0868) |
1.14 | 1586 | (0.0890, 0.1156) | (0.0817, 0.1037) |
1.12 | 2448 | (0.0685, 0.0859) | (0.0641, 0.0791) |
1.10 | 2939 | (0.0911, 0.1103) | (0.0835, 0.0993) |
Final sample data of excess cycle times using ℘_{1} from (2.16)
5 | 32 | 3 | 21 | 7 | 3 | 1 | 7 | 4 | 4 | 9 | 7 | 2 | 11 | 4 | 11 |
5 | 1 | 5 | 7 | 13 | 3 | 10 | 8 | 10 | 11 | 32 | 11 | 3 | 11 | 3 | 7 |
3 | 5 | 12 | 3 | 3 | 2 | 2 | 8 | 10 | 21 | 13 | 3 | 8 | 2 | 8 | 3 |
14 | 8 | 1 | 2 | 3 | 15 | 1 | 3 | 1 | 2 | 5 | 10 | 5 | 1 | 10 | 3 |
2 | 2 | 5 | 4 | 12 | 5 | 8 | 7 | 5 | 10 | 6 | 12 | 3 | 8 | 1 | 1 |
7 | 5 | 2 | 2 | 21 |
Final sample data of survival time (in years) for female dementia patients using ℘_{2} from (3.9)
6.75 | 1.59 | 0.50 | 0.50 | 4.17 | 3.58 | 8.16 | 2.33 | 1.67 | 10.17 | 3.75 | 1.83 |
21.00 | 1.00 | 1.83 | 0.50 | 1.66 | 1.42 | 1.67 | 9.33 | 4.08 | 18.08 | 1.00 | 7.84 |
2.67 | 1.58 | 1.67 | 7.83 | 9.17 | 1.42 | 2.00 | 12.50 | 4.92 | 21.83 | 0.58 | 1.67 |
1.25 | 11.25 | 4.67 | 5.25 | 2.17 | 3.92 | 13.83 | 5.83 | 0.83 | 4.17 | 1.25 | 3.42 |
8.50 | 11.50 | 10.33 | 5.25 | 1.08 | 3.42 | 11.25 | 5.75 | 2.00 | 5.58 | 0.83 | 1.08 |
6.92 | 4.58 | 7.00 | 4.92 | 7.83 | 2.92 | 7.84 | 2.25 | 3.08 | 9.92 | 6.58 |