In this survey, we use the normal linear model to demonstrate the use of the linear Bayes method. The superiorities of linear Bayes estimator (LBE) over the classical UMVUE and MLE are established in terms of the mean squared error matrix (MSEM) criterion. Compared with the usual Bayes estimator (obtained by the MCMC method) the proposed LBE is simple and easy to use with numerical results presented to illustrate its performance. We also examine the applications of linear Bayes method to some other distributions including two-parameter exponential family, uniform distribution and inverse Gaussian distribution, and finally make some remarks.
The linear Bayes method was originally proposed by Hartigan (1969), which suggests that in Bayesian statistics one can replace a completely specified prior distribution by an assumption with just a few moments of the distribution. It has been subsequently discussed by Rao (1973) from linear optimization viewpoint. Lamotte (1978) later develops a class of linear estimators, called Bayes linear estimators, by searching, among all linear estimators that have least average total mean squared error. Goldstein (1983) considers the problem of modifying the linear Bayes estimator for the mean of a distribution of unknown form using a sample variance estimate. Heiligers (1993) studies the relationship between linear Bayes estimation and minimax estimation in linear models with partial parameter restrictions. Hoffmann (1996) proposes a well-described subclass of Bayes linear estimators for the unknown parameter vector in the linear regression model with ellipsoidal parameter constraints and obtains a necessary and sufficient condition to ensure that the considered Bayes linear estimators improves the least squared estimator over the whole ellipsoid regardless of the selected generalized risk function. In the framework of empirical Bayes, Samaniego and Vestrup (1999) and Pensky and Ni (2000) respectively construct linear empirical Bayes estimators and establish their superiorities over standard and traditional estimators. In application fields, Busby et al. (2005) proposes the application of Bayes linear methodology to uncertainty evaluation in reservoir forecasting. Zhang and Wei (2005) also drive the unique Bayes linear unbiased estimator of estimable functions for the singular linear model. Wei and Zhang (2007) employs linear Bayes procedure to define Bayes linear minimum risk estimation in a linear model and discusses its superiorities. Recently, Zhang et al. (2011, 2012) extend the research on a linear Bayes estimator to the partitioned linear model and multivariate linear models, respectively.
In this paper, along the same line as in Wang and Singh (2014), we use the normal linear model as an example to demonstrate how to apply a linear Bayes method to simultaneously estimate all the parameters involved in the model and elaborate advantages and potential disadvantages.
Let W be a known p-dimension subspace of R^{n}. Suppose that we observe the random vector
This model is called the normal linear model as defined by Arnold (1980) and adopted by many other authors as well. Let X be a basis matrix for W. Then X is an n × p matrix of rank p, and there exists a unique β ∈ R^{p} such that μ = Xβ. Hence, an equivalent version of the normal linear model can be presented, and we observe the random vector
Define β? = (X′X)^{?1}X′Y and σ?^{2} = (||Y||^{2} ? ||Xβ? ||^{2})/(n ? p) and note that the fact that (β?,σ?^{2}) is a complete sufficient statistic for the above linear model. Hence, the classical estimators for the parameters β and σ^{2} are β? and σ?^{2}, which are the uniformly minimum variance unbiased estimator (UMVUE) in the sense of minimizing mean squared error.
From the Bayesian viewpoint, note that in most cases past experience about the parameters β and σ^{2} are often available. Let f_{0}(β,σ^{2}) be the joint prior of β and σ^{2} and the loss function be
where D is a positive definite matrix and θ? denotes the estimate of the vector θ = (β′, σ^{2})′. Then, by virtute of the Bayes theorem, the usual Bayes estimators (UBE) for β and σ^{2}, say β?_{UB} and
where g(β,σ^{2}|y) denotes the conditional joint posterior density of β and σ^{2} given Y. However, it is difficult to handle complicated or non-standard integrations. Normally, in these cases approximate Bayes estimators are suggested such as Lindley’s approximation and Tierney and Kadane’s approximation, see Lindley (1980) and Tierney and Kadane (1986) for details. Simulation-based methods such as the Gibbs sampling procedure and Metropolis method also have emerged in the past twenty years, see Martinez and Martinez (2007). Traditional Bayes estimators (UBE) are somewhat complicated and inconvenient to use in these situations.
In the following, enlightened by Rao (1973), we employ the linear Bayes method to propose a linear Bayes estimator (LBE) for the parameters β and σ^{2} simultaneously as well as investigate superiorities. We also extend our discussions on the application of linear Bayes method to some other useful distributions.
The survey is organized as follows: In Section 2 we define the LBE for the parameter vector θ = (β′, σ^{2})′ and establish it superiorities over the classic UMVUE and MLE. Numerical comparisons between the LBE and the usual Bayes estimator (UBE) are presented in Section 3. Extended discussions and remarks are made in Section 4.
Throughout this paper, for two nonnegative definite matrices A_{1} and A_{2} of the same size, we say A_{1} ≥ A_{2} if and only if A_{1} ? A_{2} is a nonnegative definite matrix.
Denote θ = (β′, σ^{2})′. In what follows we assume that the prior G(θ) belongs to the distribution family:
Put T = (β?′, σ?^{2})′ and define the linear Bayes estimator (LBE) of θ, say θ?_{LB}, be of the form θ? = BT + b satisfying
where align and b are (p+1)×(p+1) and (p+1)×1 undetermined matrices respectively, E_{(}_{Y}_{,}_{θ}_{)} denotes the expectation with respect to the joint distribution of Y and θ and the loss function is given by (
Thus, we have the following conclusion.
Let θ?_{LB}be defined by (
where W = E[Cov(T|θ)] = diag((X′X)^{?1}Eσ^{2}, 2Eσ^{4}/(n ? p)).
From the constraint E_{(}_{Y}_{,}_{θ}_{)}(θ? ? θ) = 0, we know b = Eθ ? BE_{(}_{Y}_{,}_{θ}_{)}(T). Note that
Hence b = Eθ ? BEθ, and accordingly we have
For given θ, using the independence between β? and σ?^{2}, we have
where W = diag((X′X)^{?1}Eσ^{2}, 2Eσ^{4}/(n ? p)).
Substituting (
which yields
Together with b = Eθ ? BEθ we come to the conclusion of Theorem 1.
In the definition of θ?_{LB} (
Note that
where we use θ?_{U} to denote the UMVUE of θ = (β′, σ^{2})′.
Theorem 2Let θ?_{LB}and θ?_{U}be given by (
Since E_{(}_{Y}_{,}_{θ}_{)}(θ?_{LB} ? θ) = 0, we have
Denote M = [W + Cov(θ)]^{?1}. Then by Theorem 1 we know
However,
Comparing (
The proof of Theorem 2 is completed.
Moreover, note that the MLE of θ, denoted by θ?_{ML}, equals to B_{0}T with
Thus,
Let θ?_{LB}and θ?_{ML}be given by (
We rewrite
where c_{0} = (np^{2} ? 4np ? p^{3} + 2p^{2})/{2(2n + p^{2} ? 2p)Eσ^{4}}.
Hence, in order to establish the MSEM superiority of θ?_{LB} over θ?_{ML}, it suffices to show that
for n ≥ p + 1.
Denote Cov^{?1}(θ) = S, where S = (S^{i j}) is a 2 × 2 partition matrix and
Thus, to prove (
or equivalently to show that
for n ≥ p + 1, where we use the fact of S^{11} ≥ 0 and accordingly the value of the determinant |S^{11}| is nonnegative.
Set Δ = Var(σ^{2}) ? E(σ^{2} ? Eσ^{2})(β ? Eβ)′Cov^{?1}(β)E(σ^{2} ? Eσ^{2})(β ? Eβ) and note that S^{11} = Cov^{?1}(β)[Cov(β) + E(σ^{2} ? Eσ^{2})(β ? Eβ)(1/Δ)E(σ^{2} ? Eσ^{2})(β ? Eβ)′]Cov^{?1}(β), hence we have
Further, using
where a = E(σ^{2} ? Eσ^{2})(β ? Eβ)′Cov^{?1}(β)E(σ^{2} ? Eσ^{2})(β ? Eβ).
Note that Δ = Var(σ^{2}) ? a, hence
and accordingly
where we use the facts that Eσ^{4} ≥ (Eσ^{2})^{2} and (n + 4)p^{2} ? (4n + 4)p ? p^{3} + 4n ≥ 0 for n ≥ p + 1.
Hence, Theorem 3 has been proved.
For the two-parameter exponential family given by
where x > μ, we assume that X_{(1)} ≤ X_{(2)} ≤ ··· ≤ X_{(}_{r}_{)}(2 ≤ r ≤ n) denote the type II censored samples. Define Q_{i} = [n ? (i ? 1)](X_{(}_{i}_{)} ? X_{(}_{i}_{?1)}), where X_{(0)} = 0, then Q_{1} and
Under the assumption that the prior G(θ) satisfies the condition E||θ||^{2} < ∞, we can obtain the expression of LBE θ?_{LB} for the parameter vector θ = (μ, λ)′ in this case and establish its superiorities over the θ?_{U} and θ?_{ML} by virtue of MSEM criterion similarly. The interested readers are referred to Wang and Singh (2014) for more details.
Let X_{1}, X_{2}, . . . , X_{n} be independently drawn from the uniform distribution U(θ_{1}, θ_{2}) with density f (x; θ_{1}, θ_{2}) = (θ_{2} ? θ_{1})^{?1}, where θ_{1} < x < θ_{2}. Note that X_{(1)} = min_{1≤}_{i}_{≤}_{n}X_{i} and X_{(}_{n}_{)} = max_{1≤}_{i}_{≤}_{n}X_{i} are sufficient and complete statistics, hence, set T = (X_{(1)}, X_{(}_{n}_{)})′, we obtain the classic UMVUE and MLE for θ = (θ_{1}, θ_{2})′ in this case:
Similarly, using the assumption that the prior G(θ) satisfies the condition E||θ||^{2} < ∞, the expression of LBE θ?_{LB} for the parameter vector θ = (θ_{1}, θ_{2})′ can be easily obtained and its MSEM superiorities over the θ?_{U} and θ?_{ML} can also be proved.
Let X_{1}, X_{2}, . . . , X_{n} be a random sample from the two-parameter inverse Gaussian distribution IG(α_{1}, α_{2}) with pdf
where x > 0. It is easily shown that the statistics
where T = (X?, S?)′. Assume that the prior G(θ) belongs to the prior family
To illustrate Theorem 2 and Theorem 3 we investigate the case of two-dimensional normal linear model, i.e.
where we assume that (β_{0}, β_{1})′ ~ N((1, 2)′, Cov(β_{0}, β_{1})) with Cov(β_{0}, β_{1}) having three alternative values and σ^{2} ~ U(a, b) with three different pairs a and b. We also assume that (β_{0}, β_{1}) and σ^{2} are uncorrelated, i.e. Cov(β_{0}, β_{1}, σ^{2}) = diag(Cov(β_{0}, β_{1}), Var(σ^{2})), and x = (?4, ?3, ?2, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16).
Define the percentages of improvement of θ?_{LB} over θ?_{U} and θ?_{ML}, respectively, by
For different sample size n (= 5, 10, 20), the corresponding computation results for IP(θ?_{U}) and IP(θ?_{ML}) under the three different priors are presented in Table 1, where tr(Cov(β_{0}, β_{1}, σ^{2})) is used as an index of the variation of the prior information.
As stated in Theorems 2 and 3, since the above priors belong to the family (
For the model (
Suppose p = 2 and let us consider the normal linear model
Denote β = (γ_{0}, γ_{1})′. We assume that σ^{2} follows an inverse-Gamma distribution with density
and given σ^{2} the conditional distribution of β is N_{2}(β?_{0}, σ^{2}∑_{0}).
Note that the posterior density of (β,σ^{2}) given Y is
where
However, it is almost impossible to calculate
Note that the posterior conditional densities of β given σ^{2} and σ^{2} given β are respectively proportional to
where
The Gibbs sampler was originally developed by Geman and Geman (1984) as applied to image processing and the analysis of Gibbs distributions on a lattice. It is brought into mainstream statistics through the articles of Gelfand and Smith (1990) and Gelfand et al. (1990). The Gibbs sampler can also be shown to be a special case of the Metropolis-Hastings algorithm, see Gilks et al. (1996) and Robert and Casella (1999). In describing the Gibbs sampler, we follow the treatment in Casella and George (1992).
Step 1. Choose the initial values of β and σ^{2} and denote the values of β and σ^{2} at the j^{th} step by β_{j} and
Step 2. Generate β_{j}_{+1} and
Step 3. Repeat Step 2 for N times.
Step 4. Calculate the Bayes estimator of l(β,σ^{2}) by
Note that under the above priors, it is readily seen that
In the following table we first calculate the values of θ?_{LB} and the corresponding numerical results of θ?_{UB} for different prior parameters and then present
The above numerical comparisons indicate two trends, one is that for the same prior, ||θ?_{LB} ? θ?_{UB}|| tends to be smaller as sample size gets larger, the other is that given sample size, ||θ?_{LB} ?θ?_{UB}|| increases as the prior variance becomes larger. In the process of simulation, we find that the value of ||θ?_{LB}?θ?_{UB}|| is affected by the value of
Two cases are considered for the two-parameter exponential family. In case (I) we assume that the parameters μ and λ have independent prior distributions, where μ follows an exponential distribution and λ has an inverted Gamma prior. In case (II), we suppose that, given λ, the conditional prior of μ is an inverted Gamma density and λ follows an inverted Gamma prior. For the above two cases, numerical simulations show that ||θ?_{LB} ?θ?_{UB}||s are small, which means that as a linear approximation of θ?_{UB}, θ?_{LB} works better.
In the case of the uniform distribution U(θ_{1}, θ_{2}), numerical computations show that θ?_{LB} works very good for both independent prior and non-independent prior. For example, for the single parameter uniform distribution U(0, θ_{2}), we assume the prior π(θ_{2}) has finite second-order moment and mimic the above discussions, then the LBE for the parameter θ_{2} is θ?_{2,}_{LB} = a_{0}X_{(}_{n}_{)} + b_{0} with
Specifically, let
where we utilize the relationship between the inverse Gamma and the χ^{2} distribution (Mao and Tang, 2012). Say, let n = 5, x_{(}_{n}_{)} = 2 and t_{1} = 3 and t_{2} = 8, simple computations show that a_{0} = 1.1351, b_{0} = 0.2163 and P(χ^{2}(14) ≤ 8) = 0.1107 and P(χ^{2}(16) ≤ 8) = 0.0511. Hence, we have θ?_{2,}_{LB} = 2.4865 and θ?_{2,}_{UB} = 2.4758, which show that the LBE is very close to the UBE.
For the two-parameter inverse Gaussian distribution IG(α_{1}, α_{2}), similarly, numerical studies show that LBE is adequate.
In above simulation, it should be noted that the problem of deciding when to stop the chain is an important issue and is the topic of current research in MCMC methods. If the resulting sequence has not converged to the target distribution, then the estimators and inferences we get from it are suspect. Let γ represent the characteristic of the target distribution (mean, moments, quantiles, etc.) in which we are interested. An obvious method to monitor convergence to target distribution is to run multiple sequences of the chain and plot γ versus the iteration number.
This paper uses the normal linear model Y ~ N(Xβ,σ^{2}I_{n}) as an example to investigate the application of linear Bayes method, where we employ the linear Bayes method to simultaneously estimate regression parameter β and the variance parameter σ^{2} as well as propose a linear Bayes estimator for the parameter vector θ = (β′, σ^{2})′. The proposed linear Bayes estimator is shown superior to the classical estimators UMVUE and MLE, respectively, in terms of the mean squared error matrix criterion. Numerical simulations are presented to verify the validity of the linear Bayes estimator. The procedure used in this paper includes normal distribution as its special case and can be extended easily to other useful distributions (such as log-normal, inverse Gaussian distribution and two-parameter exponential family), which are frequently used parametric lifetime models in survival analysis and reliability theory. We also discuss and remark the applications of linear Bayes method to the two-parameter exponential family, uniform distribution and the inverse Gaussian distribution. Compared with the usual Bayes estimator, we find that
(1) The proposed linear Bayes estimator is simple and easy to calculate as well as a good approximation in many situations; the linear Bayes method works especially well for the case of uniform distribution.
(2) We can always define a linear Bayes estimator if there exists sufficient statistic for the parametric model; subsequently, the conclusions of Theorems 2 and 3 always hold.
However, an advantage of the usual Bayes estimator over the linear Bayes estimator is that the former allows for noninformative (improper) priors. Of note is that the linear Bayes estimator may be an inadequate approximation in some situations even for the cases of proper priors. Hence there is still scope for the linear Bayes method to be improved. For instance, for some cases a quadratic Bayes estimator would be a better alternative. However, for the case of normal linear model, one can consider to add more other statistics into the definition of T, for example, we can replace T = (β?′,σ?^{2})′ by T_{1} = (β?′, ||β?||^{2}, σ?^{2})′ or T_{2} = (β?′, σ?^{2}, σ?^{4})′ to redefine a new linear Bayes estimator. We note that the loss function often plays an important role in Bayesian analysis; consequently, some interesting loss functions such as the balanced loss and the linex loss can be integrated with the linear Bayes method in future studies.
IP(θ?_{U}) and IP(θ?_{ML}) under different priors and sample sizes
Priors | n | IP(θ?_{U}) | IP (θ?_{ML}) | tr (Cov (β_{0}, β_{1}, σ^{2})) |
---|---|---|---|---|
5 | 0.9712 | 0.9542 | ||
10 | 0.9516 | 0.9404 | 7/3 | |
σ^{2} ~ U(7, 9) | 20 | 0.9066 | 0.8974 | |
5 | 0.9234 | 0.8779 | ||
10 | 0.8803 | 0.8524 | 40/3 | |
σ^{2} ~ U(6, 10) | 20 | 0.7706 | 0.7480 | |
5 | 0.8458 | 0.7537 | ||
10 | 0.7265 | 0.6627 | 124/3 | |
σ^{2} ~ U(4, 12) | 20 | 0.5369 | 0.4961 |
||θ?_{LB} ? θ?_{UB}|| under different prior parameters and sample sizes
n | The prior parameters | ||θ?_{LB} ? θ?_{UB}|| |
---|---|---|
λ_{0} = 2, t = 3, | 1.3369 | |
20 | λ_{0} = 2, t = 4, | 1.4472 |
λ_{0} = 2, t = 8, | 1.8390 | |
λ_{0} = 2, t = 3, | 1.0589 | |
50 | λ_{0} = 2, t = 4, | 1.1407 |
λ_{0} = 2, t = 8, | 1.4855 |
Where (x_{1}, x_{2}, . . . , x_{20}) is a subset of (x_{1}, x_{2}, . . . , x_{50}) = (?4 3.4 2.4 0 1 2 3 4 3.5 0.6 7 8 9 10 11 12 ?1.9 14 15 16 17 18 19 21 22 23 34 35 ?13 17 18.5 19.9 24 28 32 33 37 39 ?12 ?16 ?19 44 45 23.4 31.7 33.5 45.2 60.7 ?14.3 ?17).