TEXT SIZE

CrossRef (0)
Fixed-accuracy confidence interval estimation of P(X > c) for a two-parameter gamma population

Yan Zhuanga, Jun Hu1,b, Yixuan Zouc

aDepartment of Mathematics and Statistics, Connecticut College, USA;
bDepartment of Mathematics and Statistics, Oakland University, USA;
cDepartment of Statistics, University of Kentucky, USA
Correspondence to: 1Department of Mathematics and Statistics, Oakland University, Mathematics and Science Center, Room 367, 146 Library Drive, Rochester, MI 48309-4479, USA. E-mail: junhu@oakland.edu
Received July 17, 2020; Revised August 23, 2020; Accepted September 19, 2020.
Abstract

The gamma distribution is a flexible right-skewed distribution widely used in many areas, and it is of great interest to estimate the probability of a random variable exceeding a specified value in survival and reliability analysis. Therefore, the study develops a fixed-accuracy confidence interval for P(X > c) when X follows a gamma distribution, Γ(α, β), and c is a preassigned positive constant through: 1) a purely sequential procedure with known shape parameter α and unknown rate parameter β; and 2) a nonparametric purely sequential procedure with both shape and rate parameters unknown. Both procedures enjoy appealing asymptotic first-order efficiency and asymptotic consistency properties. Extensive simulations validate the theoretical findings. Three real-life data examples from health studies and steel manufacturing study are discussed to illustrate the practical applicability of both procedures.

Keywords : fixed-accuracy confidence interval, gamma distribution, sequential procedure
1. Introduction

As one of the most important distributions in probability and statistics, the gamma distribution is a flexible right-skewed distribution widely used in many areas such as reliability, environment, insurance and medicine. Burgin (1975) justified the applicability of the gamma distribution to inventory control. Vaz and Fortes (1988) discussed fitting a gamma distribution for grain sizes in a poly-crystal. Husak et al. (2007) used the gamma distribution to model rainfall data. Most recently, Johnson and Kliche (2020) compared seven estimation procedures for gamma parameters on raindrop size data. Since Ohlsson and Johansson (2010), the gamma distribution has become more or less the standard option in modeling claim cost in the insurance industry.

Many researchers have worked on the estimation of the parameters in two-parameter gamma distributions. Choi and Wette (1969) examined the numerical technique of the maximum likelihood method to estimate both parameters of a gamma distribution. Chen and Mi (1998) discussed the point estimation for the scale parameter of a gamma distribution based on grouped data, assuming that the shape parameter was known. Iliopoulos (2016) constructed exact confidence intervals for the shape parameter of a gamma distribution. The author compared the exact confidence intervals with bootstrap confidence intervals via simulation studies. Son and Oh (2006) developed a Gibbs sampling Bayesian estimator of the two-parameter gamma distribution under the non-informative prior.

Most of the literature on the parameter estimation of gamma distributions is based on fixed-sample-size procedures. That is, one aims to find the estimate for parameters of interest based on the obtained data, no matter how large or small it is. In certain situations, especially when data collecting process is time-consuming or costly, it is of great importance to understand what sample size is needed to obtain the value of estimators with prescribed accuracy, which, however, depends on some unknown nuisance parameter and cannot be fixed in advance. Thus, sequential sampling becomes necessary to solve such problems. Takada and Nagata (1995) considered a sequential procedure for building a fixed-width confidence interval for the mean of a gamma distribution. Isogai and Uno (1995) developed a sequential procedure for estimating the mean of a gamma distribution under a loss function of squared error plus linear cost. Liu (2001) worked on approximating the optimal fixed sample size expected reward through a two-stage sampling procedure. Recently, Zacks and Khan (2011) developed two-stage and sequential procedures of conducting fixed-width confidence interval estimation for the scale parameter when the shape parameter is known. Mahmoudi and Roughani (2015) studied a bounded risk two-stage sampling procedure for estimating the scale parameter of a gamma distribution with the shape parameter known. Roughani and Mahmoudi (2015) derived explicit formulas for the expected value and risk of the estimator of the scale parameter in a gamma distribution, where the shape parameter was assumed known. One may achieve a broad-ranging review in the field of sequential analysis by combining selected parts of interest from the following monographs and references therein: Stein (1949), Anscombe (1952, 1953), Chow and Robbins (1965), Woodroofe (1977), Ghosh and Mukhopadhyay (1981), Siegmund (1985), Ghosh et al. (1997) and Mukhopadhyay and de Silva (2009).

In this paper, we will focus on constructing a fixed-accuracy confidence interval for P(X > c) where X is a random variable that follows a gamma distribution, and c is a preassigned positive constant. In many applications, it is desired to estimate the probability of a random variable exceeding a specified value. For instance, in industrial hygiene, it is of interest to estimate the probability that the exposure level (level of exposure to a contaminant in a workplace) of a worker exceeds the occupational exposure limit (OEL; usually set by the Occupational Safety and Health Administration). See Krishnamoorthy and Mathew (2009). In lifetime data analysis, people are often interested in crucial events such as failure of equipment, death of a person, and development of symptoms of disease. The time to the occurrence of the event is called lifetime. There is a large volume of literature on modeling lifetime data using the gamma distribution. One is referred to Lawless (1982), Ansell and Phillips (1994), Balakrishnan and Ling (2014), Balakrishnan et al. (2019), etc. Moreover, the widely used exponential distribution is considered as a special case of the gamma distribution. One may see Nadarajah and Gupta (2007), Piegorsch (2015) and Dagpunar (2019) for more details. Therefore, it is essential to know the probability of a gamma random variable X exceeding a given value of interest, especially when at least one parameter is unknown. To the best of our knowledge, little literature has been focused on the confidence interval estimation of the rate parameter of a gamma distribution. And no literature exists in building a fixed-accuracy confidence interval for a function of the rate parameter (e.g., P(X > c)). Due to the fact that a fixed-width confidence interval is very likely to contain a negative lower bound or an upper bound being greater than 1, and it does not make sense for P(X > c) to be negative or to exceed 1, the problem for estimating P(X > c) will be better addressed using a fixed-accuracy confidence interval when the rate parameter is unknown.

Let us begin with a gamma random variable (X ~ Γ(α, β)) with the associated density function:

$f(x)=βαΓ(α)xα-1e-βx, for x>0, α,β>0.$

Here, β, the rate parameter, is of interest and remains unknown to us. We shall point out that in some research work, authors alternatively use θ = 1/β, the scale parameter, to characterize the gamma distribution. α is called the shape parameter, and can be either known or unknown.

The rest of the paper is organized as the following: In Section 2, we provide a purely sequential procedure to estimate P(X > c) with X coming from a two-parameter gamma population with known shape parameter α and unknown rate parameter β. We discuss some appealing properties of this procedure, and we support our findings through extensive simulations. In Section 3, we develop a nonparametric purely sequential procedure to estimate P(X > c) when both α and β are unknown. We also present some interesting properties of our proposed stopping rule followed by extensive simulations. Section 4 includes illustrations of the purely sequential procedures discussed in Sections 2 and 3 on three real-life data sets. Section 5 provides some concluding thoughts.

2. Sequential fixed-accuracy confidence intervals with known α

In the situation when α is known (for example, the exponential distribution is a special case of Γ(α, β) with α = 1), we can express the desired probability of X exceeding a constant c(> 0) with an incomplete gamma function as follows:

$p≡p(β)=Pβ(X>c)=Pβ(Xβ>cβ)=1-F(cβ),$

where F(·) stands for the distribution function of Γ(α, 1). Further, we define

$q≡q(β)=p1-p,$

and q has a range of (0, ∞).

Starting with a random sample, X1, . . . , Xn, from a Γ(α, β) population, one can easily obtain the maximum likelihood estimator (MLE) of β, say β̂n:

$β^n=αX¯n,$

where $X¯n=n-1∑i=1nXi$ is the sample mean. Further, using the invariant property of MLE, we have the MLEs of p and q given as follows, respectively:

$p^n=1-F (cβ^n),$

and

$q^n=p^n1-p^n.$

The central limit theorem (CLT) for MLE tells us that as n→∞,

$n(β^n-β)→DN (0,α-1β2),$

where $→D$ represents convergence in distribution. Applying the delta method, we have that as n→∞,

$n(log q^n-log q)→DN (0,σβ2),$

where “log” represents the natural logarithm, and

$σβ2=α-1β2 (d log qdβ)2.$

Note that by definition

$log q=log p-log(1-p),$

so

$d log qdβ=1pdpdβ-11-p(-dpdβ)=-cαβα-1e-cβΓ(α)F(cβ)(1-F(cβ)).$

And we have the expression of $σβ2$:

$σβ2=c2αβ2αe-2cβαΓ2(α)F2(cβ) (1-F(cβ))2.$

To estimate p = Pβ(X > c), we first focus on q = p/(1 − p), and consider a fixed-accuracy confidence interval for q of the form given by

$Jn;1=(d-1q^n,dq^n),$

with d > 1 fixed. Obviously, Jn;1 is a subset of ℛ+, and thus takes into account the positivity of the parameter q. One may refer to Mukhopadhyay and Banerjee (2014), Banerjee and Mukhopadhyay (2016), Mukhopadhyay and Zhuang (2016), Bapat (2018) and other sources for additional background information on fixed-accuracy confidence interval estimation.

With a prescribed significance level 0 < γ < 1, Jn;1 should also satisfy the condition that the coverage probability should be at least 1 − γ or approximately 1 − γ; that is,

$Pβ {q∈Jn;1}≥1-γ,$

from which it follows that

$Pβ {n|log q^n-log q|σβ-1≤nσβ-1 log d}≥1-γ⇒n≥(zlog d)2σβ2,$

where zzγ/2 stands for the upper 100(γ/2) quantile of a standard normal distribution. Now, we define the optimal fixed sample size to be

$n1*≡n1*(d)=(zlog d)2σβ2.$

The magnitude of $n1*$, unfortunately, remains unknown due to the fact that $σβ2$ depends on the unknown parameter β. Therefore, it is essential to estimate $σβ2$ by updating its estimator stage-wise as needed. Customarily, in the light of (2.3) and (2.11), one may use the MLE of $σβ2$ given by

$σ^n;12≡σβ^n2=c2αβ^n2αe-2cβ^nαΓ2(α)F2 (cβ^n)(1-F (cβ^n))2.$

### 2.1. A purely sequential procedure

In the light of Anscombe (1953) and Chow and Robbins (1965) who established fundamental theory of sequential estimation, we propose a purely sequential procedure to deal with the problem of constructing a fixed-accuracy confidence interval for Pβ(X > c) under a gamma population when the rate parameter is unknown. The basic idea is that after a pilot sample, we draw observations one by one, and terminate sampling immediately when there are enough observations according to some predefined stopping rule.

Let

$P1:N1≡N1(d)=inf {n≥m:n≥(zlog d)2σ^n;12},$

where m ≥ 1 indicates a pilot sample size, and $σ^n;12$ is defined in (2.15). That is, we first take a sample of size m, and check whether $m≥(z/log d)2σ^m;12$ is true or not. If it is true, we do not take any extra observations, and the final sample will be the pilot sample. Otherwise, we draw one observation at-a-time, check the stopping rule (2.16) successively with renewed $σ^n;12$, and stop sampling at the first time when $n≥(z/log d)2σ^n;12$ is observed and the terminated sample size is N1 = n. We state Theorem 1 to show that the procedure ℘1 terminates w.p.1.

Theorem 1

For the purely sequential fixed-accuracy confidence interval estimation procedure1given in (2.16), with all fixed α, β, γ, d, c, and m, we have:

$Pβ(N1<∞)=1.$
Proof

Let $→P$ denote convergence in probability. It is well-known that β̂n is a consistent estimator of β, and hence $β^n→Pβ$ as n→∞. Note that $σ^n;12$ is a continuous function of β̂n, so by Slutsky’s theorem, $σ^n;12→Pσβ2$ as n→∞. In view of (2.16), it is easily seen that Pβ(N1 < ∞) = 1.

Upon termination, with the acquired data

${N1,X1,…,Xm,…,XN1},$

the 100(1 − γ)% fixed-accuracy confidence interval for q will be:

$JN1;1=(d-1q^N1,dq^N1)=(p^N1d(1-p^N1),dp^N11-p^N1),$

and accordingly, the confidence interval for p = Pβ(X > c) will be:

$JN1;1*=(p^N1d-(d-1)p^N1,dp^N11+(d-1)p^N1),$

where N1 comes from (2.4) with the fully acquired data.

Now, we are in a position to discuss the appealing asymptotic efficiency and consistency properties for this newly proposed purely sequential procedure ℘1.

### Theorem 2

For the purely sequential fixed-accuracy confidence interval estimation procedure1given in (2.16), with all fixed α, β, c, γ, and m, we have:

$limd→1 Eβ [N1n1*]=1 [Asymptotic First-Order Efficiency],$$limd→1 Pβ {q∈JN1;1}=1-γ [Asymptotic Consistency],$

where $n1*$is defined in (2.14).

Proof

We prove the asymptotic first-order efficiency and consistency by applying a general framework of purely sequential fixed-width confidence intervals based on MLE proposed in Yu (1989) with the stopping rule given by

$N=inf {n≥m:In (θ^n)≥λ},$

where m is a pilot sample size, θ̂n is the MLE of a generic parameter θ, In(θ̂n) is the observed Fisher information, and λ is a multiplier such that λθ indicates an optimal sample size.

Observe that a corresponding confidence interval can be constructed for log q in the light of Jn;1 from (2.12) for q as follows:

$(log q^n-log d,log q^n+log d),$

which is symmetric about log n with a fixed width of 2 log d. The reciprocal for the asymptotic variance of $n(log q^n-log q)$, given by $1/σβ2$, can be interpreted as the Fisher information in terms of log q, and accordingly $n/σ^n;12$ is the observed Fisher information obtained from n observations.

For this fixed-width confidence interval estimation problem, the optimal fixed sample size is still $n1*$ defined in (2.14). Therefore, the associated stopping rule remains to be (2.16), and it would match Yu’s (1989) stopping rule by noting that $In(θ^n)=n/σ^n;12$ and λ = (z/log d)2. Along the lines of Yu (1989, Theorems 2 and 3), both (2.18) and (2.19) stand.

### 2.2. Simulated performances

To investigate the performance of the purely sequential fixed-accuracy confidence interval estimation procedures ℘1 based on the stopping rule given by (2.16), we include a simulation study under the exponential case; that is, the shape parameter α is fixed and known to be 1. In order to draw eligible pseudo-random observations, we fixed β = 2, but pretended it had been unknown to us. Then, we set c = α/β = 0.5 so that Pβ(X > c) to be estimated represented the probability that an observation would turn out larger than the mean value. For brevity alone, we present summaries when the pilot sample size was m = 20, and the significance level γ = 0.05, while a wide range of d values were chosen, including d = 1.10, 1.15, 1.20, 1.25, 1.30, 1.35, 1.40, 1.45, 1.50.

We summarized the following quantities in Table 1 by running 10,000 independent trials: the mean and standard deviation of the terminated sample sizes, 1 and s(N1); the ratio of 1 to the optimal fixed sample size $n1*$ as well as the difference $N¯1-n1*$; the average coverage probability $cp¯1$, which is calculated by checking the percentage of the obtained confidence intervals containing the true value of p out of 10,000 intervals, and the corresponding standard error $s(cp¯1)$; and the mean of estimated values of p = Pβ(X > c) with its standard error, $p^¯N1$ and $s(p^¯N1)$.

In Table 1, 1 increases as the preassigned accuracy d decreases. $N¯1/n1*$ is close to 1 and it gets closer to 1 as d goes smaller. These are also shown in Figures 1 and 2. The average coverage probability $cp¯1$ is close to 1 − γ = 0.95, and its standard error $s(cp¯1)$ is small across the board. Moreover, the differences between 1 and $n1*$ are small and consistent for different values of d, which is around 1 across the board. This empirically validates that our newly proposed sampling procedure ℘1 gives a very consistent and efficient determination of the required sample size for estimation. From the last two columns, we can tell $p^¯N1$ values are close to p with tiny standard errors, which suggests that our sampling procedure can provide perfect estimates for this probability of interest.

3. Sequential fixed-accuracy confidence intervals with unknown α

For a two-parameter gamma distribution Γ(α, β) with both α and β unknown, we can no longer implement ℘1 as per (2.16) to obtain a fixed-accuracy confidence interval for p that is defined in (2.1). Also, in this case, the MLEs of α and β have no closed-form expressions, and can only be solved numerically. In this section, we therefore develop a sequential fixed-accuracy confidence interval estimation methodology from a nonparametric perspective.

Let us define a new random variable Y based on the gamma random variable X: for a random sample of size n, X1, . . . , Xn, let Yi = I{Xi>c} with i = 1, . . . , n, where IA is the indicator function of an event A. Clearly, Y1, . . . , Yn are independent and identically distributed Bernoulli random variables with probability of success being p as is defined in (2.1).

Then,

$p˜n=Y¯n=n-1∑i=1nYi$

serves as an unbiased and consistent estimator of p. Furthermore, it is also the MLE of p under circumstances where both α and β are unknown in a gamma distribution. By CLT, it further holds that as n→∞,

$n(p˜n-p)→DN(0,p(1-p)).$

Similarly, we define

$q˜n=p˜n1-p˜n,$

so following the delta method, as n→∞, we have

$n (log q˜n-log q)→DN (0,σp2),$

where $σp2=1/{p(1-p)}$. In the light of (2.12) and (2.13), with preassigned d and γ, we consider the 100(1 − γ)% fixed-accuracy confidence interval for q:

$Jn;2=(d-1q˜n,dq˜n),$

which additionally satisfies the condition that

$Pα,β{q∈Jn;2}≥1-γ$

in order to obtain a confidence interval with 100(1 − γ)% confidence level.

Likewise, we derive an alternative optimal fixed sample size given by

$n2*≡n2*(d)=(zlog d)2σp2,$

which remains unknown, again. Therefore, it is essential to estimate $σp2$ through sequential procedures. Here, we adopt the estimator

$σ^n;22≡σp˜n2=1p˜n(1-p˜n).$

### 3.1. A purely sequential procedure

One can immediately identify a similarity between (2.14)–(2.15) and (3.7)–(3.8). For situations when both α and β are unknown, we can follow along the lines of Section 2, and develop a purely sequential fixed-accuracy confidence interval estimation procedure associated with the following stopping rule:

$P2:N2≡N2(d)=inf {n≥m:n≥(zlog d)2 (σ^n;22+n-1)},$

where m ≥ 1 continues to indicate a pilot sample size, and $σ^n;22$ is defined in (3.8). Different from the stopping rule (2.16), the term n−1 is incorporated here to prevent unexpected early termination of sampling due to the fact that n is discrete and hence has a positive probability to be 0 or 1.

With the pilot sample data, if $m≥(z/log d)2(σ^m;22+m-1)$, we do not take any extra observations, and the final sample is the pilot sample. Otherwise, we draw one observation at-a-time and check the stopping rule (3.9) successively with updated $σ^n;22$ until for the first time $n≥(z/log d)2(σ^n;22+n-1)$ is observed. And the final sample size will be N2 = n(> m). The following theorem states that the procedure ℘2 also terminates w.p.1.

Theorem 3

For the purely sequential fixed-accuracy confidence interval estimation procedure2given in (3.9), with all fixed α, β, γ, d, c, and m, we have:

$Pα,β(N2<∞)=1.$
Proof

Clearly, $Y¯n→Pp$ as n→∞. Applying Slutsky’s theorem, we have that $σ^n;22→σp2$ as n→∞. Therefore, Pα,β(N2 < ∞) = 1 holds.

Upon termination, with the acquired data

${N2,X1,…,Xm,…,XN2},$

we propose the 100(1 − γ)% confidence interval for q is:

$JN2;2=(d-1q˜N2,dq˜N2)=(p˜N2d(1-p˜N2),dp˜N21-p˜N2),$

and accordingly, the confidence interval for p = Pα,β(X > c) is:

$JN2;2*=(p˜N2d-(d-1)p˜N2,dp˜N21+(d-1)p˜N2) for p,$

where N2 comes from (3.1). The purely sequential procedure ℘2 also enjoys asymptotic first-order efficiency and consistency.

### Theorem 4

For the purely sequential fixed-accuracy confidence interval estimation procedure2given in (3.9), with all fixed α, β, γ, c, and m, we have:

$limd→1 Eβ [N2n2*]=1 [Asymptotic First-Order Efficiency],$$limd→1 Pβ {q∈JN2;2}=1-γ [Asymptotic Consistency],$

where $n2*$is defined in (3.7).

Theorem 4 can be proved in the same fashion as we proved Theorem 2, as long as one notes that n/σ̂n;2 is the observed Fisher information in terms of log q and refers to Yu (1989). We leave out many details here for brevity.

### 3.2. Simulated performances

Next, we include a simulation study by implementing the purely sequential fixed-accuracy confidence interval estimation procedures ℘2 based on the stopping rule given by (3.9) in an analogous way as we did in Sections 2. We drew pseudo-random observations from a Γ(2, 2) population; that is, we had fixed α = 2 and β = 2, but pretended that they were unknown. Then, we set c = α/β = 1 so that Pα,β(X > c) represented the probability that an observation would turn out larger than the mean value. For brevity alone, we present in Table 2 the selected summaries when the pilot sample size m = 20, the significance level γ = 0.05, and d = 1.10, 1.15, 1.20, 1.25, 1.30, 1.35, 1.40, 1.45, 1.50, 1.55, 1.60, 1.65, 1.70.

Running 10,000 independent trials, we reported the mean and standard deviation of the terminated sample sizes 2 and s(N2), the ratio of 2 to the optimal fixed sample size $n2*$ which should be close to 1, the average coverage probability $cp¯2$ which should be comparable with 1 − γ (= 0.95) along with its standard error $s(cp¯2)$, and the mean of estimated values of p = Pα,β(X > c) with its standard error, $p˜¯N2$ and $s(p˜¯N2)$. The simulation results in Table 2 obviously double validate all the findings of the nonparametric procedure that we proposed for (3.9). Moreover, it is worth mentioning that the differences between 2 and $n2*$ were empirically around 1.7–1.8 for different values of d, which further shows that our proposed procedure ℘2 gives a very consistent and efficient determination of the required sample size for estimation. And that $p˜¯N2$ values are close to p with tiny standard errors suggests the sampling procedure can provide perfect estimates for this probability of interest.

4. Real data illustrations

To demonstrate the practical applicability of our newly proposed fixed-accuracy confidence interval estimation procedures, we include illustrations using three real-life data sets: (i) the urine albumin-to-creatinine ratios (UACR, mg/g) of 5255 adolescent survey participants from NHANES 1999–2004, referred to as the “UACR data”; (ii) excess cycle times data in steel manufacturing; (iii) survival times data from a group of 97 female dementia patients diagnosed at age 70–74.

### 4.1. Illustration I: using UACR data

The reference population for our analysis is created using survey participants from NHANES 1999–2014 who met the following criteria: between 12 and 17 years old, not pregnant, blood pressure < 120/80 mmHg, without diabetes, no prescription medications used within the previous 30 days, and a Z-score for weight-to-height ratio ≤ 2. This yields a reference sample of size n = 5255. The UACR data were analyzed in Zou and Young (2020), where the authors had established a one-sided upper tolerance limit of 0.7358 mg/g and suggested to classify adolescents with UACR values falling below it as “strictly normal.” Interested readers may refer to the paper, and more background information is omitted here.

For illustrative purposes, we also treated the UACR data of size n = 5255 as our population, to which the exponential distribution gave a good fit (Zou and Young, 2020); that is, the shape parameter α = 1, but the rate parameter β was assumed unknown. We set c = 0.7358 and proceeded to estimate p = Pβ(X > c), which can be interpreted as the proportion of healthy adolescents with mildly increased UACR. Implementing the procedures ℘1 introduced earlier to draw observations from the UACR data set under simple random sampling without replacement, we constructed fixed-accuracy confidence intervals for q and p, respectively. The results are summarized in

### 4.2. Illustration II: using excess cycle times data

The excess cycle times data in steel manufacturing was first given in Example 6.1 from Barnett and Lewis (1994) and it is assumed to be a sample from an exponential population. Kimber (1982) and Lin and Balakrishnan (2009) declared that the observed value 92 is an outlier. For this illustration, we use the sample data without the data point 92, and set the shape parameter α = 1 with an unknown rate parameter. We are interested in knowing the excess cycle time being greater than 35 as 35 is the largest normal value that is observed. We further assigned m = 5 and γ = 0.05. In the first step, we took the first five data points as our pilot sample data. Then, we checked with the stopping rule as per (2.16) and found it was not satisfied. Thus, we continued our sampling one-at-a-time according to (2.16). In the end, the sampling was terminated with 85 observations, which was recorded in Table 4 as the observations came in. The final 95% confidence interval estimation for Pβ(X > 35) is (0.00214, 0.01897).

### 4.3. Illustration III: using survival time data from dementia patients

To illustrate the nonparametric procedure as per (3.9), we consider a data set on survival times from a group of 97 female dementia patients diagnosed at age 70–74. This data set was originally from Elandt-Johnson and Johnson (1999) and was recently discussed by Ozonur and Paul (2020) where they showed that the two-parameter gamma distribution adequately fits the dementia data. One may find the data set from Table 8 of Ozonur and Paul (2020).

According to the analysis by Xie et al. (2008), the estimated median survival time from onset of dementia to death was 4.6 years for women. Thus, it will be very helpful to know the probability of living exceeding the median time. We first randomly ordered dementia data and assuming that the data came in with that order. Then, we set γ = 0.05, d = 1.6, c = 4.6, m = 5. That is, we took the first five data points for initial analysis, and found that the stopping rule as per (3.9) was not satisfied. Thus, we continued our sampling one-at-a-time according to (3.9). The sampling was terminated after collecting the data from 71 patients, and the final data we collected are listed in

In the end, using the final observations from Table 5, we were able to get a 95% confidence interval estimation for p = Pα,β(X > 4.6) to be (0.3263, 0.5535), where X denotes the survival time for a female dementia patient. Thus, using the data we obtained, we would conclude with 95% confidence that there is a 32.63% to 55.36% chance that women aged 70–74 can live more than 4.6 years from the onset of dementia.

5. Concluding thoughts

Survival and reliability analysis are two of the most important scientific fields where the gamma distribution is often used to model data. And in these two fields, it is crucial to understand when the measurement, the random variable that is modeled using a gamma distribution, goes beyond a “dangerous” value. Therefore, in the paper, we focus on estimating P(X > c) from a two-parameter gamma population with an unknown rate parameter, and the shape parameter is either known or unknown. In cases of the known shape parameter, such as a sample from an exponential population, we provide an estimation strategy along with a purely sequential procedure for determining the necessary sample size of required accuracy. When the shape parameter is unknown, we come up with a nonparametric purely sequential procedure for achieving the prefixed accuracy. Both procedures perform well in terms of asymptotic efficiency and asymptotic consistency.

Finally, it is also worth mentioning that the nonparametric sequential fixed-accuracy confidence interval estimation procedure developed in Section 3 can be further extended to estimate P(X > c) in a general distribution-free case, since the idea does not depend on any specific population distribution. The gamma population with unknown shape parameter α and unknown rate parameter β can be viewed as an illustration.

Acknowledgement
Figures
Fig. 1. 1 v.s. d from .
Fig. 2. $N¯1/n1*$ v.s. d from Table 1.
Fig. 3. $N¯2/n2*$ from Table 2.
Fig. 4. $N¯2/n2*$v.s. d from Table 2.
TABLES

### Table 1

Simulation results by implementing ℘1 as (2.16) using Γ(1, 2) with γ = 0.05, c = 0.5, and m = 20 under 10,000 runs, where p = Pβ(X > c) = 0.3679

d$n1*$1s(N1)$N¯1/n1*$$N¯1-n1*$$cp¯1$$s(cp¯1)$$p^¯N1$$s(p^¯N1)$
1.5058.4859.766.391.02191.280.95250.00210.36950.0005
1.4569.6470.947.001.01871.300.95690.00200.36920.0004
1.4084.9286.217.741.01521.290.95240.00210.36910.0004
1.35106.75107.958.601.01131.200.95390.00210.36910.0003
1.30139.66140.829.781.00821.160.95580.00210.36900.0003
1.25193.08194.3611.591.00661.280.95170.00210.36840.0003
1.20289.21290.2814.111.00371.070.95360.00210.36850.0002
1.15492.17493.3218.461.00231.150.95350.00210.36820.0002
1.101058.321059.1726.941.00080.850.95110.00220.36820.0001

### Table 2

Simulation results by implementing ℘2 as (3.9) using Γ(2, 2) with γ = 0.05, c = 1, and m = 20 under 10,000 runs, where p = Pα,β(X > c) = 0.4060

d$n2*$2s(N2)$N¯2/n2*$$N¯2-n2*$$cp¯2$$s(cp¯2)$$p˜¯N2$$s(p˜¯N2)$
1.7056.5758.273.351.03011.700.95990.00200.40860.0006
1.6563.5265.243.401.02711.720.95570.00210.40830.0006
1.6072.1173.873.631.02451.760.96150.00190.40790.0006
1.5582.9384.693.821.02121.760.95510.00210.40750.0005
1.5096.8998.684.101.01841.790.95620.00200.40750.0005
1.45115.38117.194.381.01571.810.94910.00220.40750.0004
1.40140.70142.534.721.01311.830.95030.00220.40720.0004
1.35176.86178.665.351.01021.800.95400.00210.40700.0004
1.30231.40233.156.001.00751.750.94790.00220.40690.0003
1.25319.90321.696.961.00561.790.95260.00210.40680.0003
1.20479.19480.978.631.00371.780.94690.00220.40640.0002
1.15815.46817.2311.031.00221.770.95210.00210.40630.0002
1.101753.491755.3816.141.00111.890.95270.00210.40610.0001

### Table 3

Fixed-accuracy confidence intervals using the UACR data with γ = 0.05, c = 0.7358 and m = 20 implementing ℘1 from (2.16)

dN1JN1;1 for q$JN1;1*$ for p
1.20745(0.0993, 0.1429)(0.0903, 0.1251)
1.181045(0.0758, 0.1055)(0.0705, 0.0955)
1.161388(0.0707, 0.0951)(0.0660, 0.0868)
1.141586(0.0890, 0.1156)(0.0817, 0.1037)
1.122448(0.0685, 0.0859)(0.0641, 0.0791)
1.102939(0.0911, 0.1103)(0.0835, 0.0993)

### Table 4

Final sample data of excess cycle times using ℘1 from (2.16)

 5 32 3 21 7 3 1 7 4 4 9 7 2 11 4 11 5 1 5 7 13 3 10 8 10 11 32 11 3 11 3 7 3 5 12 3 3 2 2 8 10 21 13 3 8 2 8 3 14 8 1 2 3 15 1 3 1 2 5 10 5 1 10 3 2 2 5 4 12 5 8 7 5 10 6 12 3 8 1 1 7 5 2 2 21

### Table 5

Final sample data of survival time (in years) for female dementia patients using ℘2 from (3.9)

 6.75 1.59 0.5 0.5 4.17 3.58 8.16 2.33 1.67 10.17 3.75 1.83 21 1 1.83 0.5 1.66 1.42 1.67 9.33 4.08 18.08 1 7.84 2.67 1.58 1.67 7.83 9.17 1.42 2 12.5 4.92 21.83 0.58 1.67 1.25 11.25 4.67 5.25 2.17 3.92 13.83 5.83 0.83 4.17 1.25 3.42 8.5 11.5 10.33 5.25 1.08 3.42 11.25 5.75 2 5.58 0.83 1.08 6.92 4.58 7 4.92 7.83 2.92 7.84 2.25 3.08 9.92 6.58

References
1. Ansell JI and Phillips MJ (1994). Practical Methods for Reliability and Data Analysis, Oxford, Clarendon Press.
2. Anscombe FJ (1952). Large-sample theory of sequential estimation. Mathematical Proceedings of Cambridge Philosophical Society, 48, 600-607.
3. Anscombe FJ (1953). Sequential estimation. Journal of Royal Statistical Society, Series B, 15, 1-29.
4. Balakrishnan N, Castilla E, Martín N, and Pardo L (2019). Robust estimators for one-shot device testing data under gamma lifetime model with an application to a tumor toxicological data. Metrika, 82, 991-1019.
5. Balakrishnan N and Ling MH (2014). Gamma lifetimes and one-shot device testing analysis. Reliability Engineering & System Safety, 126, 54-64.
6. Banerjee S and Mukhopadhyay N (2016). A general sequential fixed-accuracy confidence interval estimation methodology for a positive parameter: Illustrations using health and safety data. Annals of Institute of Statistical Mathematics, 68, 541-570.
7. Bapat SR (2018). Purely sequential fixed accuracy confidence intervals for P (X < Y) under bivariate exponential models. American Journal of Mathematical and Management Sciences, 37, 386-400.
8. Barnett V and Lewis T (1994). Outliers in Statistical Data (3rd edition), Chichester, England, John Wiley & Sons.
9. Burgin TA (1975). The gamma distribution and inventory control. Journal of the Operational Research Society, 26, 507-525.
10. Chen Z and Mi J (1998). Statistical estimation for the scale parameter of the gamma distribution based on grouped data. Communications in Statistics-Theory and Methods, 27, 3035-3045.
11. Choi SC and Wette R (1969). Maximum likelihood estimation of the parameters of the gamma distribution and their bias. Technometrics, 11, 683-690.
12. Chow YS and Robbins H (1965). On the asymptotic theory of fixed-width sequential confidence intervals for the mean. Annals of Mathematical Statistics, 36, 457-462.
13. Dagpunar J (2019). The gamma distribution. Significance, 16, 10-11.
14. Elandt-Johnson RC and Johnson NL (1999). Survival Models and Data Analysis, New York, John Wiley and Sons.
15. Ghosh M and Mukhopadhyay N (1981). Consistency and asymptotic efficiency of two-stage and sequential procedures. Sankhyā, Series A, 43, 220-227.
16. Ghosh M, Mukhopadhyay N, and Sen PK (1997). Sequential Estimation, New York, Wiley.
17. Husak GJ, Michaelsen J, and Funk C (2007). Use of the gamma distribution to represent monthly rainfall in Africa for drought monitoring applications. International Journal of Climatology, 27, 935-944.
18. Iliopoulos G (2016). Exact confidence intervals for the shape parameter of the gamma distribution. Journal of Statistical Computation and Simulation, 86, 1635-1642.
19. Isogai E and Uno C (1995). On the sequential point estimation of the mean of a gamma distribution. Statistics & Probability Letters, 22, 287-293.
20. Johnson RW and Kliche DV (2020). Large sample comparison of parameter estimates in gamma raindrop distributions. Atmosphere, 11, 333.
21. Kimber AC (1982). Tests for many outliers in an exponential sample. Applied Statistics, 31, 263-271.
22. Krishnamoorthy K and Mathew T (2009). Statistical Tolerance Regions: Theory, Applications, and Computation, New York, John Wiley & Sons.
23. Lawless JF (1982). Statistical Models and Methods for Lifetime Data, New York, JohnWiley & Sons.
24. Lin C and Balakrishnan N (2009). Exact computation of the null distribution of a test for multiple outliers in an exponential sample. Computational Statistics & Data Analysis, 53, 3281-3290.
25. Liu JF (2001). Two-stage approximation of expected reward for gamma random variables. Communications in Statistics-Theory and Methods, 30, 1471-1480.
26. Mahmoudi E and Roughani G (2015). Bounded risk estimation of the scale parameter of a gamma distribution in a two-stage sampling procedure. Sequential Analysis, 34, 25-38.
27. Mukhopadhyay N and Banerjee S (2014). Purely sequential and two stage fixed-accuracy confidence interval estimation methods for count data for negative binomial distributions in statistical ecology: One-sample and two-sample problems. Sequential Analysis, 33, 251-285.
28. Mukhopadhyay N and de Silva BM (2009). Sequential Methods and Their Applications, Boca Ratton, CRC.
29. Mukhopadhyay N and Zhuang Y (2016). On fixed-accuracy and bounded accuracy confidence interval estimation problems in Fisher’s “Nile” example. Sequential Analysis, 35, 516-535.
30. Nadarajah S and Gupta AK (2007). The exponentiated gamma distribution with application to drought data. Calcutta Statistical Association Bulletin, 59, 29-54.
31. Ohlsson E and Johansson B (2010). Non-Life Insurance Pricing with Generalized Linear Models, Berlin, Springer.
32. Ozonur D and Paul S (2020). Goodness of fit tests of the two-parameter gamma distribution against the three-parameter generalized gamma distribution. Communications in Statistics-Simulation and Computation. in press
33. Piegorsch WW (2015). Statistical Data Analytics: Foundations for Data Mining, Informatics, and Knowledge Discovery, John Wiley & Sons University of Arizona.
34. Roughani G and Mahmoudi E (2015). Exact risk evaluation of the two-stage estimation of the gamma scale parameter under bounded risk constraint. Sequential Analysis, 34, 387-405.
35. Siegmund D (1985). Sequential Analysis: Tests and Confidence Intervals, New York, Springer.
36. Son YS and Oh M (2006). Bayesian estimation of the two-parameter gamma distribution. Communications in Statistics-Simulation and Computation, 35, 285-293.
37. Stein C (1949). Some problems in sequential estimation. Econometrica, 17, 77-78.
38. Takada Y and Nagata Y (1995). Fixed-width sequential confidence interval for the mean of a gamma distribution. Journal of Statistical Planning and Inference, 44, 277-289.
39. Vaz MF and Fortes MA (1988). Grain size distribution: the lognormal and the gamma distribution functions. Scripta Metallurgica, 22, 35-40.
40. Woodroofe M (1977). Second order approximations for sequential point and interval estimation. Annals of Statistics, 5, 984-995.
41. Xie J, Brayne C, and Matthews FE (2008). Survival times in people with dementia: analysis from population based cohort study with 14 year follow-up. BMJ: British Medical Journal, 336, 258-262.
42. Yu KF (1989). On fixed-width confidence intervals associated with maximum likelihood estimation. Journal of Theoretical Probability, 2, 193-199.
43. Zacks S and Khan RA (2011). Two-stage and sequential estimation of the scale parameter of a gamma distribution with fixed-width intervals. Sequential Analysis, 30, 297-307.
44. Zou Y and Young DS (2020). Improving coverage probabilities for parametric tolerance intervals via bootstrap calibration. Statistics in Medicine, 39, 2152-2166.