TEXT SIZE

CrossRef (0)
New approach for analysis of progressive Type-II censored data from the Pareto distribution

Jung-In Seoa, Suk-Bok Kang1,b, Ho-Yong Kimb

aDepartment of Statistics, Daejeon University, Korea;
bDepartment of Statistics, Yeungnam University, Gyeongsan, Korea
Correspondence to: 1Department of Statistics, Yeungnam University, 280 Daehak-Ro, Gyeongsan, Gyeongbuk 38541, Korea. E-mail: sbkang@ynu.ac.kr
Received July 19, 2018; Revised August 7, 2018; Accepted August 7, 2018.
Abstract

Pareto distribution is important to analyze data in actuarial sciences, reliability, finance, and climatology. In general, unknown parameters of the Pareto distribution are estimated based on the maximum likelihood method that may yield inadequate inference results for small sample sizes and high percent censored data. In this paper, a new approach based on the regression framework is proposed to estimate unknown parameters of the Pareto distribution under the progressive Type-II censoring scheme. The proposed method provides a new regression type estimator that employs the spacings of exponential progressive Type-II censored samples. In addition, the provided estimator is a consistent estimator with superior performance compared to maximum likelihood estimators in terms of the mean squared error and bias. The validity of the proposed method is assessed through Monte Carlo simulations and real data analysis.

Keywords : Pareto distribution, progressive Type-II censored sample, weighted linear regression
1. Introduction

Since its introduction in Cohen (1963), the progressive Type-II censoring scheme has gained considerable popularity and has been extensively studied in, for instance, Balakrishnan et al. (2003), Wang (2008), and Seo and Kang (2015). Under this censoring scheme, it is assumed that there are n randomly selected units on a life test. Once the first failure occurs, R1 units are randomly removed from the n – 1 surviving units. Subsequently, following the second observed failure, R2 units are randomly removed from the n – 2 – R1 surviving units, and the procedure is continued in this manner. Finally, at the time of the mth observed failure, all remaining Rm = nmR1 – · · · – Rm−1 units are removed from the test. Here, the number of failures m and Ri (i = 1, . . . ,m) are prefixed. It should be noted that the case m = n, where R1 = · · · = Rm = 0 corresponds to the complete sample, whereas the case R1 = · · · = Rm−1 = 0, Rm = nm corresponds to a conventional Type-II censoring scheme.

In this study, a new estimation method based on the regression framework is proposed under the progressive Type-II censoring scheme. It focuses on estimations of the unknown parameters in the Pareto distribution. The cumulative distribution function (cdf) and the probability distribution function (pdf) of the random variable X with the Pareto distribution are

$F(x)=1-(θx)λ,f(x)=λθλx-(λ+1), x>θ, λ>0, θ>0,$

respectively, where θ is the scale parameter and λ is the shape parameter. Lu and Tao (2007) proposed a new estimation by a weight least square method to estimate unknown parameters of the Pareto distribution. Kim et al. (2017) proposed an estimation using a pivotal quantity adopted from the regression framework for obtaining a consistent estimator for the shape parameter in the Pareto distribution. In the present study, these methods are extended to the progressive Type-II censoring scheme, and a consistent estimator of the shape parameter is obtained that do not depend on the nuisance parameter θ and are superior to maximum likelihood estimators (MLEs) in terms of the mean squared error (MSE) and bias.

The paper is organized as follows. In Section 2, existing methods for estimating the unknown parameters of the Pareto distribution are presented. In Section 3, a new approach based on the weighted least squares method is proposed. In Section 4, a Monte Carlo simulation is conducted, and real data analysis is performed to assess the proposed method. Finally, Section 5 concludes the paper.

2. Revisit to classical inference

This section provides the results of existing inferences for the scale parameter θ and the shape parameter λ of the Pareto distribution (Balakrishnan and Aggarwala, 2000). Let X1:m:n ≤ ·· · ≤ Xm:m:n be a progressive Type-II censored sample from the Pareto distribution with the censoring scheme (R1, . . . , Rm). Then, the corresponding likelihood function is given by

$L(θ,λ)∝λmθλn∏i=1mxi:m:n-λ(1+R1:m:n)-1.$

The MLE $θ^$ is the first order statistic of a progressive Type-II sample because the likelihood function (2.1) is an increasing function of θ. In addition, the MLE of λ is given by

$λ^=m∑i=1m(1+Ri) log (Xi:m:n)-n log X1:m:n,$

which has the inverse gamma distribution with parameters (m – 1, λm). It can be easily shown by the pivotal quantity

$W(λ)=λ∑i=1m(1+Ri) log (Xi:m:n)-n log X1:m:n$

that has the gamma distribution with parameters (m – 1, 1). Therefore, the MLEs $θ^$ and λ̂ are biased estimators. Alternately, Balakrishnan and Aggarwala (2000) provided estimators of θ and λ, given by

$θ^U=[1-mn(m-1)λ^] θ^,λ^U=(m-2)mλ^,$

respectively. Note that both $θ^$U and λ̂U are unbiased and consistent estimators with the variances

$Var(θ^U)=θ2mλn(m-1)(λn-2),Var(λ^U)=λ2m-3,$

respectively.

3. Estimation based on regression framework

In this section, a new approach is proposed to obtain an estimator that is superior to the MLE in terms of MSE and bias.

Let

$Yi:m:n=-log [1-F(xi:m:n)]=λ log (Xi:m:nθ), i=1,…,m.$

Then, Y1:m:n · · · Ym:m:n is a progressive Type-II censored sample from the standard exponential distribution. Consider the spacing (Viveros and Balakrishnan, 1994)

$Si=(Yi:m:n-Yi-1:m:n)∑j=im(Ri+1), i=1,…,m (Y0:m:n≡0),$

which are independent and identically distributed standard exponential random variables. Then,

$Di:m:n=Yi:m:n-Yi-1:m:n=λ log (Xi:m:nXi-1:m:n), i=2,…,m,$

have the exponential distribution with the mean

$E(Di:m:n)=[∑j=1m(1+Rj)]-1,$

and can lead to the following linear regression model:

$E(Di:m:n)=λ log (Xi:m:nXi-1:m:n)+ɛi, i=2,…,m,$

where ɛi is the error term with E(ɛi) = 0. It should be noted that (3.1) can be considered a simple regression line with no intercept. That is, the model only has the shape parameter λ. Then, an estimator can be obtained that does not depend on the nuisance parameter θ, as:

$λ^wl1=∑i=2mwiE(Di:m:n) log (Xi:m:nXi-1:m:n)∑i=2mwi [log (Xi:m:nXi-1:m:n)]2$

by minimizing the squared distance

$∑i=2mwi [E(Di:m:n)-λ log (Xi:m:nXi-1:m:n)]2,$

where wi is the weight on each data point. However, the problem with this approach is that it is not easy to find the weight wi that makes $λ^$wl1 an unbiased estimator or a consistent estimator. Instead, the idea of Kim et al. (2017) is extended to the progressive Type-II censoring scheme here.

Let

$Di:m:n*=Yi:m:n-Y1:m:n, i=2,…,m.$

Then, by the same argument, another estimator of λ is obtained as

$λ^w/2=∑i=2mwiE (Di:m:n*) log (Xi:m:nX1:m:n)∑i=2mwi [log (Xi:m:nX1:m:n)]2,$

where

$E (Di:m:n*)=∑j=2i[∑k=jm(1+Rk)]-1, 2≤i≤m,$

by Theorem 7.2.1 in Balakrishnan and Cramer (2014). To find the weight wi that makes $λ^$wl2 a consistent estimator, the following lemma is required:

### Lemma 1

Let

$Qi:m:n=Di:m:n*E (Di:m:n*)-E2 (Di:m:n*)Var (Di:m:n*),$$Ti:m:n=Di:m:n*2-E2 (Di:m:n*)Var (Di:m:n*),$$Ui:m:n=E2 (Di:m:n*)Var (Di:m:n*),$

where

$Var (Di:m:n*)=∑j=2i[∑k=jm(1+Rk)]-2, 2≤i≤m,$

by Theorem 7.2.1 in Balakrishnan and Cramer (2014). Then

• $1m2∑i=2mQi:m:n$converges to zero in probability as m.

• $1m2∑i=2mTi:m:n$converges to zero in probability as m.

• $1m2∑i=2mUi:m:n$does not converges to zero as m.

Proof

It is clear that

$E(|1m2∑i=2mQi:m:n|)=0 and E(|1m2∑i=2mTi:m:n|)=m-1m2.$

Then, both (3.2) and (3.3) converge in the mean to 0. This implies convergence in probability, Karr (1993). In addition,

$1m2∑i=2mUi:m:n=1m2∑i=2m{∑j=2i[∑k=jm(1+Rk)]-1}2∑j=2i[∑k=jm(1+Rk)]-2≥1m2∑i=2m{∑j=2i[m-1+∑k=jmRk]-1}2∑j=2i(m-i+1)-2=1m2∑i=2m(i-1) (m-i+1m-1+∑k=jmRk)2=1m2 (m-1+∑k=jmRk)2(m2∑j=1mj+∑j=1mj3-2m∑j=1mj2),$

which converges to a constant as m → ∞. Therefore, it completes the proof.

### Theorem 1

Let $wi=1/Var(Di:m:n*)$. Then, the estimator $λ^$wl2 is a consistent estimator.

Proof

The estimator $λ^$wl2 can be written as

$λ^w/2=λ∑i=2mDi:m:n*E (Di:m:n*)/Var (Di:m:n*)∑i=2mDi:m:n*2/Var (Di:m:n*)=λ∑i=2mQi:m:n+∑i=2mUi:m:n∑i=2mTi:m:n+∑i=2mUi:m:n=λ∑i=2mQi:m:n/m2+∑i=2mUi:m:n/m2∑i=2mTi:m:n/m2+∑i=2mUi:m:n/m2,$

by Lemma 1, the fraction term in (3.4) converge to 1 in probability as m → ∞, and this completes the proof.

4. Application

In this section, the proposed estimators are assessed by Monte Carlo simulations; in addition, two real data sets are presented.

### 4.1. Simulation study

The estimators discussed in Sections 2 and 3 are compared in terms of MSE and bias. Unlike estimators $λ^$ and $λ^$U, the exact mean and variance of the estimator $λ^$wl2 cannot be expressed in a closed form. Therefore, its MSE and bias are obtained over 10,000 replications. Progressive Type-II censored samples were generated from the Pareto distribution with λ = 0.5, 1.5 and θ = 1 using the algorithm in Balakrishnan and Sandhu (1995). The MSEs and biases of other estimators were obtained using their mean and variance (Table 1). For notational simplicity, the scheme (0, 0, . . . , nm) is denoted by ((m – 1) * 0, nm). For instance, (3 * 0, 2) denotes the progressive censoring scheme (0, 0, 0, 2).

Table 1 shows that the proposed estimator $λ^$wl2 is more efficient than the MLE $λ^$ in terms of MSE and bias, while it has somewhat higher values of the MSE and bias, compared with $λ^$U. As expected, the estimator $θ^$U is more efficient than the MLE $θ^$ in terms of the MSE and bias. In addition, the MSE of all estimators decrease with an increase in the sample size n as well as with a decrease in the size nm of the unobserved (censored) data set for a fixed sample size n. It should be noted that the consistent estimator $λ^$wl2 does not depend on the nuisance parameter θ as well as more efficient than the MLE $λ^$ in terms of the MSE and bias.

### 4.2. Real data

Two real data sets (device lifetime and business failure) are considered in Fernandez (2014). Wu et al. (2007) generated a progressive Type-II censored sample

$0.0098,0.0376,0.0661,0.0849,0.1112,0.1447,0.1904,0.2463$

with the censoring scheme (1, 0, 2, 0, 3, 2, 0, 4) from the device lifetime data. The business failure data in Nigm and Hamdy (1987) represents the time (in years) for which a business operates until failure. A sample of fifteen businesses was used. Fernandez (2014) used a progressive Type-II censored sample

$1.01,1.05,1.08,1.14,1.28,1.30,1.33,1.43,1.59,1.62$

with the censoring scheme (9*0, 5) from the business failure data. Here, progressive Type-II censored samples were used to obtain the estimates discussed in Sections 2 and 3 (Table 2).

5. Conclusions

A new approach was proposed to estimate the unknown parameters of the Pareto distribution under the progressive Type-II censoring scheme in the regression framework; subsequently, it was proved that the proposed estimator, $λ^$wl2 is consistent. In addition, the estimator $λ^$wl2 does not depend on the nuisance parameter θ as well as satisfactory in terms of MSE and bias. The proposed approach can be applied to other censoring schemes and life-time distributions despite showing a somewhat lower performance than the estimator $λ^$U in simulation study; however, it can be a very good alternative if an unbiased estimator cannot be obtained from the maximum likelihood or pivotal-based methods.

TABLES

### Table 1

MSEs (biases) for estimators of θ and λ

θ λ n m Censoring scheme $λ^$ $λ^$U $λ^$wl2 $θ^$ $θ^$U
1 0.5 20 20 (20*0) 0.021 (0.056) 0.015 (0.000) 0.017 (0.002) 0.028 (0.111) 0.013 (0.000)
10 (2*0, 1, 0, 2, 0, 2, 2*0, 5) 0.071 (0.125) 0.036 (0.000) 0.045 (0.027) 0.028 (0.111) 0.014 (0.000)

30 30 (30*0) 0.012 (0.036) 0.009 (0.000) 0.011 (0.000) 0.011 (0.071) 0.005 (0.000)
20 (9*0, 10, 10*0) 0.021 (0.056) 0.015 (0.000) 0.016 (−0.005) 0.011 (0.071) 0.005 (0.000)
15 (5, 6*0, 10, 7*0) 0.034 (0.077) 0.021 (0.000) 0.023 (−0.006) 0.011 (0.071) 0.005 (0.000)

40 40 (40*0) 0.008 (0.026) 0.007 (0.000) 0.008 (0.000) 0.006 (0.053) 0.003 (0.000)
30 (10*0, 5, 7*0, 3, 10*0, 2) 0.012 (0.036) 0.009 (0.000) 0.011 (0.001) 0.006 (0.053) 0.003 (0.000)
20 (8*0, 2*10, 10*0) 0.021 (0.056) 0.015 (0.000) 0.016 (−0.009) 0.006 (0.053) 0.003 (0.000)

1.5 20 20 (20*0) 0.191 (0.167) 0.132 (0.000) 0.152 (0.007) 0.002 (0.034) 0.001 (0.000)
10 (2*0, 1, 0, 2, 0, 2, 2*0, 5) 0.643 (0.375) 0.321 (0.000) 0.405 (0.081) 0.002 (0.034) 0.001 (0.000)

30 30 (30*0) 0.107 (0.107) 0.083 (0.000) 0.095 (0.001) 0.001 (0.023) 0.001 (0.000)
20 (9*0, 10, 10*0) 0.191 (0.167) 0.132 (0.000) 0.148 (−0.014) 0.001 (0.023) 0.001 (0.000)
15 (5, 6*0, 10, 7*0) 0.303 (0.231) 0.188 (0.000) 0.204 (−0.017) 0.001 (0.023) 0.001 (0.000)

40 40 (40*0) 0.074 (0.079) 0.061 (0.000) 0.072 (−0.001) 0.001 (0.017) 0.000 (0.000)
30 (10*0, 5, 7*0, 3, 10*0, 2) 0.107 (0.107) 0.083 (0.000) 0.095 (0.002) 0.001 (0.017) 0.000 (0.000)
20 (8*0, 2*10, 10*0) 0.191 (0.167) 0.132 (0.000) 0.147 (−0.028) 0.001 (0.017) 0.000 (0.000)

MSE = mean squared error.

### Table 2

Estimates for real data

$λ^$ $λ^$U $λ^$wl2 $θ^$ $θ^$U
Device 0.17350 0.13012 0.09001 0.00980 0.00657
Business 2.16083 1.72867 1.79410 1.01000 0.97538

References
1. Balakrishnan, N, and Aggarwala, R (2000). Progressive Censoring: Theory, Methods, and Applications. Boston: Birkhäuser
2. Balakrishnan, N, and Cramer, E (2014). The Art of Progressive Censoring. New York: Springer
3. Balakrishnan, N, Kannan, N, Lin, CT, and Ng, HKT (2003). Point and interval estimation for Gaussian distribution, based on progressively Type-II censored samples. IEEE Transactions on Reliability. 52, 90-95.
4. Balakrishnan, N, and Sandhu, RA (1995). A simple simulational algorithm for generating progressive Type-II censored samples. American Statistical Association. 49, 229-230.
5. Cohen, AC (1963). Progressively censored sample in life testing. Technometrics. 5, 327-339.
6. Fernandez, AJ (2014). Computing optimal confidence sets for Pareto models under progressive censoring. Journal of Computational and Applied Mathematics. 258, 168-180.
7. Karr, AF (1993). Probability. New York: Springer-Verlag
8. Kim, JHT, Ahn, S, and Ahn, S (2017). Parameter estimation of the Pareto distribution using a pivotal quantity. Journal of the Korean Statistical Society. 46, 438-450.
9. Lu, HL, and Tao, SH (2007). The estimation of Pareto distribution by a weighted least square method. Quality & Quantity. 41, 913-926.
10. Nigm, AM, and Hamdy, HI (1987). Bayesian prediction bounds for the Pareto lifetime model. Communications in Statistics - Theory and Methods. 16, 1761-1772.
11. Seo, JI, and Kang, SB (2015). Pivotal inference for the scaled half logistic distribution based on progressively Type-II censored samples. Statistics & Probability Letters. 104, 109-116.
12. Viveros, R, and Balakrishnan, N (1994). Interval estimation of parameters of life from progressively censored data. Technometrics. 36, 84-91.
13. Wang, B (2008). Goodness-of-fit test for the exponential distribution based on progressively Type-II censored sample. Journal of Statistical Computation and Simulation. 78, 125-132.
14. Wu, JW, Wu, SF, and Yu, CM (2007). One-sample Bayesian predictive interval of future ordered observations for the Pareto distribution. Quality & Quantity. 41, 251-263.