TEXT SIZE

CrossRef (0)
How to Improve Classical Estimators via Linear Bayes Method?

Lichun Wanga

aDepartment of Mathematics, Beijing Jiaotong University, China
Correspondence to: Lichun Wang
Department of Mathematics, Beijing Jiaotong University, Beijing 100044, China. E-mail: lchwang@bjtu.edu.cn
Received October 17, 2015; Revised October 26, 2015; Accepted October 26, 2015.
Abstract

In this survey, we use the normal linear model to demonstrate the use of the linear Bayes method. The superiorities of linear Bayes estimator (LBE) over the classical UMVUE and MLE are established in terms of the mean squared error matrix (MSEM) criterion. Compared with the usual Bayes estimator (obtained by the MCMC method) the proposed LBE is simple and easy to use with numerical results presented to illustrate its performance. We also examine the applications of linear Bayes method to some other distributions including two-parameter exponential family, uniform distribution and inverse Gaussian distribution, and finally make some remarks.

Keywords : linear Bayes method, MCMC method, MSEM criterion, normal linear model, two-parameter exponential family, uniform distribution, inverse Gaussian distribution
1. Introduction

The linear Bayes method was originally proposed by Hartigan (1969), which suggests that in Bayesian statistics one can replace a completely specified prior distribution by an assumption with just a few moments of the distribution. It has been subsequently discussed by Rao (1973) from linear optimization viewpoint. Lamotte (1978) later develops a class of linear estimators, called Bayes linear estimators, by searching, among all linear estimators that have least average total mean squared error. Goldstein (1983) considers the problem of modifying the linear Bayes estimator for the mean of a distribution of unknown form using a sample variance estimate. Heiligers (1993) studies the relationship between linear Bayes estimation and minimax estimation in linear models with partial parameter restrictions. Hoffmann (1996) proposes a well-described subclass of Bayes linear estimators for the unknown parameter vector in the linear regression model with ellipsoidal parameter constraints and obtains a necessary and sufficient condition to ensure that the considered Bayes linear estimators improves the least squared estimator over the whole ellipsoid regardless of the selected generalized risk function. In the framework of empirical Bayes, Samaniego and Vestrup (1999) and Pensky and Ni (2000) respectively construct linear empirical Bayes estimators and establish their superiorities over standard and traditional estimators. In application fields, Busby et al. (2005) proposes the application of Bayes linear methodology to uncertainty evaluation in reservoir forecasting. Zhang and Wei (2005) also drive the unique Bayes linear unbiased estimator of estimable functions for the singular linear model. Wei and Zhang (2007) employs linear Bayes procedure to define Bayes linear minimum risk estimation in a linear model and discusses its superiorities. Recently, Zhang et al. (2011, 2012) extend the research on a linear Bayes estimator to the partitioned linear model and multivariate linear models, respectively.

In this paper, along the same line as in Wang and Singh (2014), we use the normal linear model as an example to demonstrate how to apply a linear Bayes method to simultaneously estimate all the parameters involved in the model and elaborate advantages and potential disadvantages.

Let W be a known p-dimension subspace of Rn. Suppose that we observe the random vector

$Y~Nn?(μ,σ2In),?????????μ∈W,σ2>0.$

This model is called the normal linear model as defined by Arnold (1980) and adopted by many other authors as well. Let X be a basis matrix for W. Then X is an n × p matrix of rank p, and there exists a unique βRp such that μ = . Hence, an equivalent version of the normal linear model can be presented, and we observe the random vector

$Y~Nn?(Xβ,σ2In),?????????β∈Rp,σ2>0.$

Define β? = (X′X)?1X′Y and σ?2 = (||Y||2 ? ||Xβ? ||2)/(n ? p) and note that the fact that (β?,σ?2) is a complete sufficient statistic for the above linear model. Hence, the classical estimators for the parameters β and σ2 are β? and σ?2, which are the uniformly minimum variance unbiased estimator (UMVUE) in the sense of minimizing mean squared error.

From the Bayesian viewpoint, note that in most cases past experience about the parameters β and σ2 are often available. Let f0(β,σ2) be the joint prior of β and σ2 and the loss function be

$L?(θ^,θ)=(θ^-θ)′?D?(θ^-θ),$

where D is a positive definite matrix and θ? denotes the estimate of the vector θ = (β′, σ2). Then, by virtute of the Bayes theorem, the usual Bayes estimators (UBE) for β and σ2, say β?UB and $σ^UB2$, can be calculated by

$β^UB=∫∫βg?(β,σ2?y)?dβdσ2,σ^UB2=∫∫σ2g?(β,σ2?y)?dβdσ2,$

where g(β,σ2|y) denotes the conditional joint posterior density of β and σ2 given Y. However, it is difficult to handle complicated or non-standard integrations. Normally, in these cases approximate Bayes estimators are suggested such as Lindley’s approximation and Tierney and Kadane’s approximation, see Lindley (1980) and Tierney and Kadane (1986) for details. Simulation-based methods such as the Gibbs sampling procedure and Metropolis method also have emerged in the past twenty years, see Martinez and Martinez (2007). Traditional Bayes estimators (UBE) are somewhat complicated and inconvenient to use in these situations.

In the following, enlightened by Rao (1973), we employ the linear Bayes method to propose a linear Bayes estimator (LBE) for the parameters β and σ2 simultaneously as well as investigate superiorities. We also extend our discussions on the application of linear Bayes method to some other useful distributions.

The survey is organized as follows: In Section 2 we define the LBE for the parameter vector θ = (β′, σ2) and establish it superiorities over the classic UMVUE and MLE. Numerical comparisons between the LBE and the usual Bayes estimator (UBE) are presented in Section 3. Extended discussions and remarks are made in Section 4.

Throughout this paper, for two nonnegative definite matrices A1 and A2 of the same size, we say A1A2 if and only if A1 ? A2 is a nonnegative definite matrix.

2. Linear Bayes Estimator and Its Superiorities

### 2.1. The proposed LBE

Denote θ = (β′, σ2). In what follows we assume that the prior G(θ) belongs to the distribution family:

$G={G(θ):E[?β?2+(σ2)2]<∞}.$

Put T = (β?′, σ?2) and define the linear Bayes estimator (LBE) of θ, say θ?LB, be of the form θ? = BT + b satisfying

$R?(θ^LB,θ)=minB,bE(Y,θ)L?(θ?,θ)?????????and?????????E(Y,θ)?(θ^LB-θ)=0,$

where align and b are (p+1)×(p+1) and (p+1)×1 undetermined matrices respectively, E(Y,θ) denotes the expectation with respect to the joint distribution of Y and θ and the loss function is given by (1.3).

Thus, we have the following conclusion.

Theorem 1

Let θ?LBbe defined by (2.2). If np + 1, then

$θ^LB=T-W[W+Cov(θ)]-1(T-Eθ),$

where W = E[Cov(T|θ)] = diag((X′X)?12, 24/(n ? p)).

Proof

From the constraint E(Y,θ)(θ? ? θ) = 0, we know b = ? BE(Y,θ)(T). Note that

$E(Y,θ)(T)=E[E(T?θ)]=E(β′,σ2)′=Eθ.$

Hence b = ? BEθ, and accordingly we have

$R?(θ?,θ)=E(Y,θ)L?(θ?,θ)=E(Y,θ)[BT+Eθ-BEθ-θ]′D[BT+Eθ-BEθ-θ]=E(Y,θ)?{tr?(D[B(T-Eθ)-(θ-Eθ)]?[B(T-Eθ)-(θ-Eθ)]′)}=tr?(DE(Y,θ)?[B(T-Eθ)-(θ-Eθ)]?[B(T-Eθ)-(θ-Eθ)]′)=tr?(DBE(Y,θ)?[(T-Eθ)(T-Eθ)′]B′)-tr?(DCov(θ)B′)-tr(DBCov(θ))+tr(DCov(θ)).$

For given θ, using the independence between β? and σ?2, we have

$E(Y,θ)?[(T-Eθ)(T-Eθ)′]=E[Cov(T?θ)]+Cov(E(T?θ))=W+Cov(θ),$

where W = diag((X′X)?12, 24/(n ? p)).

Substituting (2.5) into (2.4) and letting ∂R(θ?, θ)/∂B be zero, we have

$DB[W+Cov(θ)]-DCov(θ)=0,$

which yields

$B=Ip+1-W[W+Cov(θ)]-1.$

Together with b = ? BEθ we come to the conclusion of Theorem 1.

Remark 1

In the definition of θ?LB (2.2), if we discard the so-called unbiased constraint E(Y,θ)(θ?LB ? θ) = 0, then by directly computing R(θ?, θ) and denoting ∂R(θ?, θ)/∂B = 0 and ∂R(θ?, θ)/∂b = 0, we can obtain the same expression for the LBE θ?LB, which means that θ?LB satisfies the unbiased condition as well as performs best among linear Bayes estimators in the sense of minimizing E(Y,θ)L(BT + b, θ).

### 2.2. The superiorities of LBE

Note that

$θ^U=(β^′,σ^2)′=(Ip001)?(β^σ^2)=T,$

where we use θ?U to denote the UMVUE of θ = (β′, σ2).

Theorem 2

Let θ?LBand θ?Ube given by (2.2) and (2.7) respectively. If np+1, then θ?LBis superior to θ?Uin terms of MSEM criterion, i.e. MSEM(θ?LB)MSEM(θ?U).

Proof

Since E(Y,θ)(θ?LB ? θ) = 0, we have

$MSEM?(θ^LB)=E(Y,θ)?[(θ^LB-θ)?(θ^LB-θ)′]=E[Cov?(θ^LB-θ?θ)]+Cov?(E[θ^LB-θ?θ]).$

Denote M = [W + Cov(θ)]?1. Then by Theorem 1 we know

$MSEM?(θ^LB)=(I-WM)W(I-WM)′+WMCov(θ)(WM)′=(1-WM)W(I-MW)+WMCov(θ)MW=W-2WMW+WM[W+Cov(θ)]MW=W-WMW.$

However,

$MSEM?(θ^U)=E(Y,θ)?[(θ^U-θ)?(θ^U-θ)′]=E{E?[(θ^U-θ)?(θ^U-θ)′?θ]}=E{E[(T-θ)(T-θ)′?θ]}=W.$

Comparing (2.9) with (2.10), we have

$MSEM?(θ^LB)≤MSEM?(θ^U).$

The proof of Theorem 2 is completed.

Moreover, note that the MLE of θ, denoted by θ?ML, equals to B0T with

$B0=(Ip00n-pn).$

Thus,

$MSEM?(θ^ML)=B0WB0′+(B0-Ip+1)?E?(θθ′)?(B0-Ip+1)′=(X′XEσ200n2(2n+p2-2p)Eσ4)-1.$
Theorem 3

Let θ?LBand θ?MLbe given by (2.2) and (2.12) respectively. If np + 1, then θ?LBis superior to θ?MLin terms of MSEM criterion, i.e. MSEM(θ?LB)MSEM(θ?ML).

Proof

We rewrite

$MSEM?(θ^LB)=W-WMW=[W-1+Cov-1(θ)]-1=[(X′XEσ200n2(2n+p2-2p)Eσ4)+(000c0)+Cov-1(θ)]-1,$

where c0 = (np2 ? 4np ? p3 + 2p2)/{2(2n + p2 ? 2p)4}.

Hence, in order to establish the MSEM superiority of θ?LB over θ?ML, it suffices to show that

$(000c0)+Cov-1(θ)≥0,$

for np + 1.

Denote Cov?1(θ) = S, where S = (Si j) is a 2 × 2 partition matrix and

$S11=Cov-1(β)+Cov-1(β)E?(σ2-Eσ2)?(β-Eβ)S22E(σ2-Eσ2)?(β-Eβ)′Cov-1(β),S12=-Cov-1(β)E?(σ2-Eσ2)?(β-Eβ)S22,S21=-S22E?(σ2-Eσ2)?(β-Eβ)′Cov-1(β),S22=[Var?(σ2)-E?(σ2-Eσ2)?(β-Eβ)′Cov-1(β)E?(σ2-Eσ2)?(β-Eβ)]-1.$

Thus, to prove (2.15), it is adequate to show that

$?S11??|c0+S22-S21?(S11)-1S12|≥0,$

or equivalently to show that

$c0+S22-S21?(S11)-1S12≥0,$

for np + 1, where we use the fact of S11 ≥ 0 and accordingly the value of the determinant |S11| is nonnegative.

Set Δ = Var(σ2) ? E(σ2 ? 2)(β ? )Cov?1(β)E(σ2 ? 2)(β ? ) and note that S11 = Cov?1(β)[Cov(β) + E(σ2 ? 2)(β ? )(1/Δ)E(σ2 ? 2)(β ? )]Cov?1(β), hence we have

$c0+S22-S21?(S11)-1S12=c0+1Δ-1ΔE?(σ2-Eσ2)?(β-Eβ)′×[Cov(β)+E?(σ2-Eσ2)?(β-Eβ)1ΔE?(σ2-Eσ2)?(β-Eβ)′]-1×E?(σ2-Eσ2)?(β-Eβ)1Δ.$

Further, using $[Σ+AΣ1A′]-1=Σ-1-Σ-1A[A′Σ-1A+Σ1-1]-1A′Σ-1$, we have

$c0+S22-S21?(S11)-1S12=c0+1Δ-aΔ2+a2(Δ+a)Δ2,$

where a = E(σ2 ? 2)(β ? )Cov?1(β)E(σ2 ? 2)(β ? ).

Note that Δ = Var(σ2) ? a, hence

$1Δ-aΔ2+a2(Δ+a)Δ2=1Var(σ2),$

and accordingly

$c0+S22-S21?(S11)-1S12=c0+1Var?(σ2)=[(n+4)p2-(4n+4)p-p3+4n]?Eσ4+[4np-(n+2)p2+p3]?(Eσ2)22(2n+p2-2p)?Var(σ2)?Eσ4≥[2p2+4n-4p]?(Eσ2)22(2n+p2-2p)?Var?(σ2)?Eσ4>0,?????????for???n≥p+1,$

where we use the facts that 4 ≥ (2)2 and (n + 4)p2 ? (4n + 4)p ? p3 + 4n ≥ 0 for np + 1.

Hence, Theorem 3 has been proved.

Remark 2

For the two-parameter exponential family given by

$f(x;μ,λ)=λ-1?exp?(x-μλ),$

where x > μ, we assume that X(1)X(2) ≤ ··· ≤ X(r)(2 ≤ rn) denote the type II censored samples. Define Qi = [n ? (i ? 1)](X(i) ? X(i?1)), where X(0) = 0, then Q1 and $P=∑i=2rQi$ are mutually independent and also (Q1, P) is sufficient for the parameter vector (μ, λ). Set T = (Q1, P), the classical UMVUE and MLE for θ = (μ, λ) can be defined as follows

$θ^U=(1n-1n(r-1)01r-1)?T,?????????θ^ML=(1n001r)?T.$

Under the assumption that the prior G(θ) satisfies the condition E||θ||2 < ∞, we can obtain the expression of LBE θ?LB for the parameter vector θ = (μ, λ) in this case and establish its superiorities over the θ?U and θ?ML by virtue of MSEM criterion similarly. The interested readers are referred to Wang and Singh (2014) for more details.

Remark 3

Let X1, X2, . . . , Xn be independently drawn from the uniform distribution U(θ1, θ2) with density f (x; θ1, θ2) = (θ2 ? θ1)?1, where θ1 < x < θ2. Note that X(1) = min1≤inXi and X(n) = max1≤inXi are sufficient and complete statistics, hence, set T = (X(1), X(n)), we obtain the classic UMVUE and MLE for θ = (θ1, θ2) in this case:

$θ^U=(nn-1-1n-1-1n-1nn-1)?T,?????????θ^ML=(1001)?T.$

Similarly, using the assumption that the prior G(θ) satisfies the condition E||θ||2 < ∞, the expression of LBE θ?LB for the parameter vector θ = (θ1, θ2) can be easily obtained and its MSEM superiorities over the θ?U and θ?ML can also be proved.

Remark 4

Let X1, X2, . . . , Xn be a random sample from the two-parameter inverse Gaussian distribution IG(α1, α2) with pdf

$f(x;α1,α2)=(α22πx3)12?exp(-α2(x-α1)22α12x),$

where x > 0. It is easily shown that the statistics $X?=(1/n)?∑i=1nXi$ and $S?=n/{∑i=1n(1/Xi-1/X?)}$ are sufficient and complete. Tweedie (1957) shows that X? and S? are independent, X? having an inverse Gaussian distribution with parameters α1 and 2, and 2/ S? having a $χn-12$ distribution. Schwarz and Samanta (1991) gives a proof of these facts using an inductive argument. Hence we obtain the classic UMVUE and MLE for the parameter θ = (α1, α2) as follows

$θ^U=(100n-3n)?T,?????????θ^ML=(1001)?T,$

where T = (X?, S?). Assume that the prior G(θ) belongs to the prior family $G={G(θ):E[α12+α22]<∞,E[α13α2-1]<∞}$, we can obtain the expression of LBE θ?LB for the parameter vector θ = (α1, α2) and prove that it prevails over the classic UMVUE and MLE under MSEM criterion.

### 2.3. An illustration example

To illustrate Theorem 2 and Theorem 3 we investigate the case of two-dimensional normal linear model, i.e.

$Y~N?(β0+β1x,σ2In),$

where we assume that (β0, β1) ~ N((1, 2), Cov(β0, β1)) with Cov(β0, β1) having three alternative values and σ2 ~ U(a, b) with three different pairs a and b. We also assume that (β0, β1) and σ2 are uncorrelated, i.e. Cov(β0, β1, σ2) = diag(Cov(β0, β1), Var(σ2)), and x = (?4, ?3, ?2, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16).

Define the percentages of improvement of θ?LB over θ?U and θ?ML, respectively, by

$IP?(θ^U)=tr?(MSEM?(θ^U)-MSEM?(θ^LB))tr?(MSEM?(θ^U))?????????and?????????IP?(θ^ML)=tr?(MSEM?(θ^ML)-MSEM?(θ^LB))tr?(MSEM?(θ^ML)).$

For different sample size n (= 5, 10, 20), the corresponding computation results for IP(θ?U) and IP(θ?ML) under the three different priors are presented in Table 1, where tr(Cov(β0, β1, σ2)) is used as an index of the variation of the prior information.

As stated in Theorems 2 and 3, since the above priors belong to the family (2.1), both MSEM(θ?U)? MSEM(θ?LB) and MSEM(θ?ML) ? MSEM(θ?LB) are always nonnegative definite. From Table 1, firstly we can see that when the sample size n is fixed, as expected, both IP(θ?U) and IP(θ?ML) decrease as the variation of the prior increases (i.e, tr(Cov(β0, β1, σ2)) tends to be larger); secondly, for the same prior information, it is only natural that as the sample size n grows, which means that the sample information gets more, both IP(θ?U) and IP(θ?ML) decrease; finally, it seems that IP(θ?U) is larger than IP(θ?ML), the reason may be due to MSEM(θ?U) ≥ MSEM(θ?ML) for our case.

3. Numerical Comparisons between LBE and UBE

For the model (1.2), note that under the loss L(θ?, θ) and the prior G(θ), the usual Bayes estimator (UBE) of θ, say θ?UB, would be equal to E(θ|Y). In this Section, for given priors G(θ), we present some numerical comparisons between the LBE θ?LB and the UBE θ?UB, the latter is calculated by employing an MCMC sampling method.

Suppose p = 2 and let us consider the normal linear model

$Y~N?(γ0+γ1x,σ2In).$

Denote β = (γ0, γ1). We assume that σ2 follows an inverse-Gamma distribution with density

$π?(σ2;λ0,t)=tλ0-1Γ(λ0-1)?(1σ2)λ0exp?(-tσ2)$

and given σ2 the conditional distribution of β is N2(β?0, σ20).

Note that the posterior density of (β,σ2) given Y is

$f?(β,σ2?y)∝(σ-2)λ0+1+n2exp{-12σ2[2λ0+(n-2)σ^2+(β-β?0)′Σ0-1?(β-β?0)+(β-β^)′X′X(β-β^)]},$

where $X′=(11?1x1x2?xn)$ denotes the design matrix.

However, it is almost impossible to calculate $θ^UB=(β^UB′,σ^UB2)′$ analytically, so we have to obtain it numerically.

Note that the posterior conditional densities of β given σ2 and σ2 given β are respectively proportional to

$f1?(β?σ2,y)∝exp?{-12σ2?(β-β?)′Σ?-1?(β-β?)},$$f2?(σ2?β,y)∝σ-2(λ0+1+n2)?exp?(-c12σ2),$

where $β?=(Σ0-1+X′X)-1(Σ0-1β?0+X′Xβ^),?Σ?=σ2(Σ0-1+X′X)-1$ and $c1=2λ0+(n-2)σ^2+(β-β?0)′Σ0-1(β-β?0)+(β-β^)′X′X(β-β^)$.

The Gibbs sampler was originally developed by Geman and Geman (1984) as applied to image processing and the analysis of Gibbs distributions on a lattice. It is brought into mainstream statistics through the articles of Gelfand and Smith (1990) and Gelfand et al. (1990). The Gibbs sampler can also be shown to be a special case of the Metropolis-Hastings algorithm, see Gilks et al. (1996) and Robert and Casella (1999). In describing the Gibbs sampler, we follow the treatment in Casella and George (1992).

• Step 1. Choose the initial values of β and σ2 and denote the values of β and σ2 at the jth step by βj and $σj2$, respectively.

• Step 2. Generate βj+1 and $σj+12$ from $f1(β?σj2,y)$ and f2(σ2|βj, y), respectively.

• Step 3. Repeat Step 2 for N times.

• Step 4. Calculate the Bayes estimator of l(β,σ2) by $1/(N-m0)∑j=m0+1Nl(βj,σj2)$, where l(β,σ2) denotes any a function of β and σ2 and m0 is the burn-in period.

Note that under the above priors, it is readily seen that $Eθ=(Eβ′,Eσ2)′=(β?0′,t/[λ0-2])′$ and Cov(θ) = diag(t/(λ0 ? 2) · ∑0, t2/{(λ0 ? 2)2(λ0 ? 3)}) since Cov(β) = ECov(β|σ2) = ∑02 and Corr(β,σ2) = 0.

In the following table we first calculate the values of θ?LB and the corresponding numerical results of θ?UB for different prior parameters and then present $?θ^LB-θ^UB?=?β^LB-β^UB?2+(σ^LB2-σ^UB2)2$, which is defined as an index of degree of approximation between θ?LB and θ?UB.

The above numerical comparisons indicate two trends, one is that for the same prior, ||θ?LB ? θ?UB|| tends to be smaller as sample size gets larger, the other is that given sample size, ||θ?LB ?θ?UB|| increases as the prior variance becomes larger. In the process of simulation, we find that the value of ||θ?LB?θ?UB|| is affected by the value of $(σ^LB2-σUB2)2$; however, the value of ||β?LB ? β?UB||2 is always small, which means the LBE β?LB is rather close to the UBE β?UB and there could be a certain difference between the $LBE?σ^LB2$ and the $UBEσ^UB2$ for our cases.

### Remark 5

Two cases are considered for the two-parameter exponential family. In case (I) we assume that the parameters μ and λ have independent prior distributions, where μ follows an exponential distribution and λ has an inverted Gamma prior. In case (II), we suppose that, given λ, the conditional prior of μ is an inverted Gamma density and λ follows an inverted Gamma prior. For the above two cases, numerical simulations show that ||θ?LB ?θ?UB||s are small, which means that as a linear approximation of θ?UB, θ?LB works better.

### Remark 6

In the case of the uniform distribution U(θ1, θ2), numerical computations show that θ?LB works very good for both independent prior and non-independent prior. For example, for the single parameter uniform distribution U(0, θ2), we assume the prior π(θ2) has finite second-order moment and mimic the above discussions, then the LBE for the parameter θ2 is θ?2,LB = a0X(n) + b0 with

$a0=(n+1)(n+2)Var(θ2)(n+1)2Eθ22-n(n+2)(Eθ2)2?????????and?????????b0=[(1-a0)n+1]Eθ2n+1.$

Specifically, let $π(θ2)=t2t1θ2-t1-1?exp(-t2/θ2)/Γ(t1)$ and together with f (x(n)|θ2) and the squared loss, we know that the UBE θ?2,UB is

$E(θ2?x(n))=∫x(n)∞θ2-t1-n?exp(-t2/θ2)?dθ2∫x(n)∞θ2-t1-n-1?exp(-t2/θ2)?dθ2=t2t1+n-1P(χ2(2(t1+n-1))≤2t2/x(n))P(χ2(2(t1+n))≤2t2/x(n)),$

where we utilize the relationship between the inverse Gamma and the χ2 distribution (Mao and Tang, 2012). Say, let n = 5, x(n) = 2 and t1 = 3 and t2 = 8, simple computations show that a0 = 1.1351, b0 = 0.2163 and P(χ2(14) ≤ 8) = 0.1107 and P(χ2(16) ≤ 8) = 0.0511. Hence, we have θ?2,LB = 2.4865 and θ?2,UB = 2.4758, which show that the LBE is very close to the UBE.

### Remark 7

For the two-parameter inverse Gaussian distribution IG(α1, α2), similarly, numerical studies show that LBE is adequate.

### Remark 8

In above simulation, it should be noted that the problem of deciding when to stop the chain is an important issue and is the topic of current research in MCMC methods. If the resulting sequence has not converged to the target distribution, then the estimators and inferences we get from it are suspect. Let γ represent the characteristic of the target distribution (mean, moments, quantiles, etc.) in which we are interested. An obvious method to monitor convergence to target distribution is to run multiple sequences of the chain and plot γ versus the iteration number.

4. Conclusions and Remarks

This paper uses the normal linear model Y ~ N(,σ2In) as an example to investigate the application of linear Bayes method, where we employ the linear Bayes method to simultaneously estimate regression parameter β and the variance parameter σ2 as well as propose a linear Bayes estimator for the parameter vector θ = (β′, σ2). The proposed linear Bayes estimator is shown superior to the classical estimators UMVUE and MLE, respectively, in terms of the mean squared error matrix criterion. Numerical simulations are presented to verify the validity of the linear Bayes estimator. The procedure used in this paper includes normal distribution as its special case and can be extended easily to other useful distributions (such as log-normal, inverse Gaussian distribution and two-parameter exponential family), which are frequently used parametric lifetime models in survival analysis and reliability theory. We also discuss and remark the applications of linear Bayes method to the two-parameter exponential family, uniform distribution and the inverse Gaussian distribution. Compared with the usual Bayes estimator, we find that

• (1) The proposed linear Bayes estimator is simple and easy to calculate as well as a good approximation in many situations; the linear Bayes method works especially well for the case of uniform distribution.

• (2) We can always define a linear Bayes estimator if there exists sufficient statistic for the parametric model; subsequently, the conclusions of Theorems 2 and 3 always hold.

However, an advantage of the usual Bayes estimator over the linear Bayes estimator is that the former allows for noninformative (improper) priors. Of note is that the linear Bayes estimator may be an inadequate approximation in some situations even for the cases of proper priors. Hence there is still scope for the linear Bayes method to be improved. For instance, for some cases a quadratic Bayes estimator would be a better alternative. However, for the case of normal linear model, one can consider to add more other statistics into the definition of T, for example, we can replace T = (β?′,σ?2) by T1 = (β?′, ||β?||2, σ?2) or T2 = (β?′, σ?2, σ?4) to redefine a new linear Bayes estimator. We note that the loss function often plays an important role in Bayesian analysis; consequently, some interesting loss functions such as the balanced loss and the linex loss can be integrated with the linear Bayes method in future studies.

TABLES

### Table 1

IP(θ?U) and IP(θ?ML) under different priors and sample sizes

PriorsnIP(θ?U)IP (θ?ML)tr (Cov (β0, β1, σ2))
$(β0β1)~N((12),(113131))$50.97120.9542
100.95160.94047/3
σ2 ~ U(7, 9)200.90660.8974

$(β0β1)~N((12),(6226))$50.92340.8779
100.88030.852440/3
σ2 ~ U(6, 10)200.77060.7480

$(β0β1)~N((12),(186618))$50.84580.7537
100.72650.6627124/3
σ2 ~ U(4, 12)200.53690.4961

### Table 2

||θ?LB ? θ?UB|| under different prior parameters and sample sizes

nThe prior parameters||θ?LB ? θ?UB||

λ0 = 2, t = 3, $β?0=(-23),?Σ0=(10.70.71)$1.3369
20λ0 = 2, t = 4, $β?0=(-23),?Σ0=(53.53.55)$1.4472

λ0 = 2, t = 8, $β?0=(-23),?Σ0=(20141420)$1.8390

λ0 = 2, t = 3, $β?0=(-23),?Σ0=(10.70.71)$1.0589
50λ0 = 2, t = 4, $β?0=(-23),?Σ0=(53.53.55)$1.1407

λ0 = 2, t = 8, $β?0=(-23),?Σ0=(20141420)$1.4855

Where (x1, x2, . . . , x20) is a subset of (x1, x2, . . . , x50) = (?4 3.4 2.4 0 1 2 3 4 3.5 0.6 7 8 9 10 11 12 ?1.9 14 15 16 17 18 19 21 22 23 34 35 ?13 17 18.5 19.9 24 28 32 33 37 39 ?12 ?16 ?19 44 45 23.4 31.7 33.5 45.2 60.7 ?14.3 ?17).

References
1. Arnold, SF (1980). The Theory of Linear Models and Multivariate Analysis. New York: John Wiley & Sons.
2. Busby, D, Farmer, CL, and Iske, A 2005. Uncertainty evaluation in reservoir forecasting by Bayes linear methodology., Algorithms for approximation, proceedings of the 5th international conference, Chester, pp.187-196.
3. Casella, G, and George, EI (1992). An introduction to Gibbs Sampling. The American Statistician. 46, 167-174.
4. Gelfand, AE, and Smith, AFM (1990). Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association. 85, 398-409.
5. Gelfand, AE, Hills, SE, Racine-Poon, A, and Smith, AFM (1990). Illustration of Bayesian inference in normal data models using Gibbs sampling. Journal of the American Statistical Association. 85, 972-985.
6. Geman, S, and Geman, D (1984). Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Transactions PAMI. 6, 721-741.
7. Gilks, WR, Richardson, S, and Spiegelhalter, DJ (1996). Markov Chain Monte Carlo in Practice. London: Chapman and Hall.
8. Goldstein, M (1983). General variance modifications for linear Bayes estimators. Journal of the American Statistical Association. 78, 616-618.
9. Hartigan, JA (1969). Linear Bayesian methods. Journal of the Royal Statistical Society, Series B. 31, 440-454.
10. Heiligers, B (1993). Linear Bayes and minimax estimation in linear models with partially restricted parameter space. Journal of Statistical Planning and Inference. 36, 175-183.
11. Hoffmann, K (1996). A subclass of Bayes linear estimators that are minimax. Acta Applicandue Mathematicae. 43, 87-95.
12. Lamotte, LR (1978). Bayes linear estimators. Technometrics. 3, 281-290.
13. Lindley, DV (1980). Approximate Bayesian methods. Trabajos de Estadistica. 21, 223-237.
14. Mao, SS, and Tang, YC (2012). Bayesian Statistics. Beijing: China Statistics Press.
15. Martinez, WL, and Martinez, AR (2007). Computational Statistics Handbook with MATLAB. New York: Chapman & Hall/CRC.
16. Pensky, M, and Ni, P (2000). Extended linear empirical Bayes estimation. Communications in Statistics - Theory and Methods. 29, 579-592.
17. Rao, CR (1973). Linear Statistical Inference and Its Applications. New York: John Wiley & Sons.
18. Robert, CP, and Casella, G (1999). Monte Carlo Statistical Methods. New York: Springer-Verlag.
19. Samaniego, FJ, and Vestrup, E (1999). On improving standard estimators via linear empirical Bayes methods. Statistics & Probability Letters. 44, 309-318.
20. Schwarz, CJ, and Samanta, M (1991). An inductive proof of the sampling distributions for the MLEs of the parameters in an inverse Gaussian distributions. The American Statistician. 45, 223-235.
21. Tierney, L, and Kadane, JB (1986). Accurate approximations for posterior moments and marginal densities. Journal of the American Statistical Association. 81, 82-86.
22. Tweedie, MCK (1957). Statistical properties of inverse Gaussian distributions. The Annals of Mathematical Statistics. 28, 362-377.
23. Wang, LC, and Singh, RS (2014). Linear Bayes estimator for the two-parameter exponential family under type II censoring. Computational Statistics and Data Analysis. 71, 633-642.
24. Wei, LS, and Zhang, WP (2007). The superiorities of Bayes linear minimum risk estimation in linear model. Communications in Statistics-Theory and Methods. 36, 917-926.
25. Zhang, WP, and Wei, LS (2005). On Bayes linear unbiased estimation of estimable functions for the singular linear model. Science in China Series A Mathematics. 7, 898-903.
26. Zhang, WP, Wei, LS, and Chen, Y (2011). The superiorities of Bayes linear unbiased estimation in partitioned linear model. Journal of System Science and Complexity. 5, 945-954.
27. Zhang, WP, Wei, LS, and Chen, Y (2012). The superiorities of Bayes linear unbiased estimation in multivariate linear models. Acta Mathematicae Applicatae Sinica. , 383-394.