TEXT SIZE

• •   CrossRef (0) Bayesian inference for an ordered multiple linear regression with skew normal errors  Jeongmun Jeonga, Younshik Chung1,a

aDepartment of Statistics, Pusan National University, Korea
Correspondence to: 1Department of Statistics, Pusan National University, 2, Busandaehak-ro 63beon-gil, Geumjeonggu, Busan 46241, Korea. Email: yschung@pusan.ac.kr
Received July 13, 2019; Revised November 26, 2019; Accepted December 18, 2019.
Abstract
This paper studies a Bayesian ordered multiple linear regression model with skew normal error. It is reasonable that the kind of inherent information available in an applied regression requires some constraints on the coefficients to be estimated. In addition, the assumption of normality of the errors is sometimes not appropriate in the real data. Therefore, to explain such situations more flexibly, we use the skew-normal distribution given by Sahu et al. (The Canadian Journal of Statistics, 31, 129–150, 2003) for error-terms including normal distribution. For Bayesian methodology, the Markov chain Monte Carlo method is employed to resolve complicated integration problems. Also, under the improper priors, the propriety of the associated posterior density is shown. Our Bayesian proposed model is applied to NZAPBŌĆÖs apple data. For model comparison between the skew normal error model and the normal error model, we use the Bayes factor and deviance information criterion given by Spiegelhalter et al. (Journal of the Royal Statistical Society Series B (Statistical Methodology), 64, 583–639, 2002). We also consider the problem of detecting an influential point concerning skewness using Bayes factors. Finally, concluding remarks are discussed.
Keywords : Bayes factor, deviance information criterion, influential point, Markov chain Monte Carlo, ordered linear regression, Skew normal distribution
1. Introduction

There are many applications with the statistical structures for which a constrained linear regression model is proper. It is often reasonable that the incomplete information available in an applied regression require some different types of constraints on their parameters to be estimated. For example, the amount of apples naturally increases as an apple tree grows older. However, if there is not enough data, the analysis results may differ from common sense. In order to prevent this, it is recommended to place an ordinal constraint. For these points, Chen and Deely (1996) considered one such application and illustrated why the Bayesian model is both attractive and appropriate. They then used the normal distribution for error terms in a constrained linear multiple regression model. But, the normality assumption of the errors is not appropriate in real data because some data sometimes have heavy tails or are asymmetric in various fields. To overcome this, as expected, the assumption of the error terms would be more flexible. Therefore, we use the skew normal distribution for error terms containing the normal distribution for the Bayesian ordered multiple linear regression model as a flexible distribution.

The skew normal distribution was first introduced by O’Hagan and Leonard (1976). Azzalini (1985) conducted study on the construction of the family of univariate skew-normal distributions. Azzalini and Dalla-Valle (1996) extended the univariate skew-normal distribution to the multivariate case. The skew-normal distribution is a family of distributions with an additional parameter of bias. Some applications of the skew-normal regression are presented in Azzalini and Capitanio (1999), under a classical method. Sahu et al. (2003) proposed a new class of multivariate skew distributions to Bayesian regression models. Jang et al. (2009) also applied the skew elliptical distribution to Bayesian meta analysis. Jung et al. (2018) recently studied the Bayesian change-point problem with hierarchical prior distribution using skew distribution.

Our goal is to propose the Bayesian inference for an ordered multiple regression model with skewnormal error terms. We then show that Bayesian methodology is particularly suited to the constrained problem compared to frequentist methodology. Here, we use the Markov chain Monte Carlo (MCMC) method to resolve complicated integration problems related to Bayesian inference. We also perform simulation studies and apply our proposed model to New Zealand apple data that was already used by Chen and Deely (1996). We verify the convergence of MCMC for all parameters based on the Gelman-Rubin shrinkage factor. For model comparison between normal error model and skew normal error model, we use the Bayes factor (BF) and deviance information criterion (DIC) given by Spiegelhalter et al. (2002). We also consider the problem of detecting an influential point concerning skewness using Bayes factors.

The paper is organized as follows. In Chapter 2, we review the skew normal distribution and an ordered multiple linear regression. In Chapter 3, we discuss a Bayesian ordered multiple regression model with skew normal errors and its Bayesian computations using the MCMC method. The propriety of the associated posterior density is also shown under the given improper priors. We present the model comparison using BF and DIC. The measure Kd of the effect of observations on BF is employed for detecting influential data. In Chapter 4, we conduct simulation studies to check the performance of the skew-normal error model and apply our model to NZAPB’s apple data. We then compare our results with Chen and Deely (1996)’s results. Concluding remarks are given in Chapter 5.

2. Ordered multiple linear regression with skew normal distribution

### 2.1. Skew normal distribution

According to Azzalini (1985), a random variable Z has a standard skew-normal distribution with probability density function given by $f Z ( z | λ ) = 2 φ ( z ) Φ ( δ z ) ,$where z ∈ (−∞,∞), ŽĢ, and Φ denote the probability density function and the distribution function of a standard normal distribution, respectively and δ is a real-valued parameter controlling the skewness of the distribution. In particular, one revisits the normal case if δ = 0. If a location parameter μ and a scale parameter σ exist, Sahu et al. (2003) proposed the skew normal distribution SN(μ,σ2, δ) as follows: $f ( ŽĄ | μ , σ 2 , δ ) = 2 σ 2 + δ 2 φ [ ŽĄ - μ σ 2 + δ 2 ] Φ [ δ σ ŽĄ - μ σ 2 + δ 2 ] ,$where ε ∈ (−∞,∞). Similarly, if δ = 0, the skew normal distribution in (2.2) is also the same as the N(μ,σ2).

### 2.2. Ordered multiple linear regression

First, we introduce the general expression of linear regression model as: $Y = E ( Y | X ) + ŽĄ$and $E ( Y | X ) = X β ,$where Y = (y1, . . . , yn)′ is response variable, X = (x1, . . . , xn)′ is a n × p design matrix, xi = (xi1, . . . , xip)′ for i = 1, . . . , n, β = (β1, . . . , βp), p is the number of parameters and ε = (ε1, . . . , εn)′ denotes the error term which is usually distributed to the multivariate normal distribution. Denote the constrained set of parameter β by $S = { ( β 1 , β 2 , … , β p ) ′ : β ∈ Q ∈ R p } ,$where Q is a subspace of S under some type of constraints and Rp is p-dimensional Euclidean space. Denote the constrained set of parameter β = (β1, . . . , βp)′ by $β ∈ Q = { β ∈ R p ; 0 ≤ β 1 ≤ β 2 ≤ Ōŗ» ≤ β p }$which is considered one of the constrained forms in the Bayesian main structure in Section 3 according to the real data in Section 4.

3. Bayesian Inference

Our model to be considered as: $Y = X β + ŽĄ , ŌĆŖŌĆŖ ŌĆŖŌĆŖ ŌĆŖŌĆŖ 0 ≤ β 1 ≤ β 2 ≤ Ōŗ» ≤ β p ,$where Y = (y1, . . . , yn)′ is response variable, X = (x1, . . . , xn)′ is a n × p design matrix, xi = (xi1, . . . , xip), i = 1, . . . , n, β = (β1, . . . , βp), p is the dimension of parameters and ε = (ε1, . . . , εn)′ is distributed to SN(μ,σ2, δ) in (2.2).

### 3.1. Bayesian model

According to Sahu et al.’s expression, it follows from (2.3) and (3.1) that the likelihood function of β, σ2, δ based on the data y, X is $L ( β , σ 2 , δ | y , X ) = ∏ i = 1 n [ 2 σ 2 + δ 2 φ ( y i - ∑ j = 1 p x i j β j σ 2 + δ 2 ) Φ ( δ σ y i - ∑ j = 1 p x i j β j σ 2 + δ 2 ) ] = ∏ i = 1 n [ 2 σ 2 + δ 2 1 2 π exp ( - 1 2 ( σ 2 + δ 2 ) ( y i - ∑ j = 1 p x i j β j ) 2 ) ] ŌĆŖŌĆŖ ŌĆŖŌĆŖ ŌĆŖŌĆŖ × ∏ i = 1 n [ ∫ - ∞ ∞ I ( w < δ σ y i - ∑ j = 1 p x i j β j σ 2 + δ 2 ) 1 2 π e - w 2 2 d w ] .$

We consider a noninformative priors on β based on Chen and Deely (1996). Then, the prior on β is taken as $π 1 ( β ) ∝ 1 Q ( β ) ,$where 1Q(β) = 1 if βQ, 0 otherwise. Finally, the priors of σ2 and δ are given by $π 2 ( σ 2 ) ∝ 1 ( σ 2 ) k + 1 e - k σ 2$and $π 3 ( δ ) = N ( μ δ , σ δ 2 ) ,$respectively. Here, as k goes to zero, π2(σ2) approached to 12 which is considered as the improper prior of σ2. Since the prior distributions of each parameter are assumed to be independent, the joint prior density of (β, σ2, δ) is given by $π ( β , σ 2 , δ ) ∝ π 1 ( β ) π 2 ( σ 2 ) π 3 ( δ ) .$

### Theorem 1

Suppose that the priors β, σ2, and δ are given in (3.3)(3.6). Then, the posterior density of β, σ2, and δ based on the likelihood in (3.2) is proper if n > p.

We provide a proof of Theorem 1 in the ref-type="app" rid="app1">Appendix.

### 3.2. Bayesian computations

Here, it is hard to handle the integration part to calculate full conditional density of each parameter for Bayesian computation. Using the data augmentation suggested by Tanner and Wong (1987), we consider z = (z1, z2, . . . , zn) as unobserved data in (3.2). Then the complete likelihood function of β, σ2, δ, and z based on the data y and X is given by $L c ( β , σ 2 , δ , z | y , X ) = ∏ i = 1 n [ I ( z i < δ σ y i - ∑ j = 1 p x i j β j σ 2 + δ 2 ) ] ( 1 σ 2 + δ 2 ) n × exp ( - 1 2 1 σ 2 + δ 2 ∑ i = 1 n ( y i - ∑ j = 1 p x i j β j ) 2 + ∑ i = 1 n z i 2 ) ,$where z ∈ (−∞,∞). From (3.6) and (3.7), the complete posterior density function of (β,σ2, δ, z) is given by $π ( β , δ , σ 2 , z | y , X ) ∝ ŌĆŖŌĆŖ L c ( β , σ 2 , z | y , X ) π ( β , δ , σ 2 )$subject to 0 ≤ β1β2 ≤ · · · ≤ βp < βp+1 = ∞.

To sample from the complete posterior distribution π(β, δ, σ2, z|y, X) given in (3.7), we can apply the Gibbs sampler by taking the full conditional densities as: For i = 1, . . . , n, let z(−i) = (z1, . . . , zi−1, zi+1, . . . , zn). Then, the full conditional density of zi is given by $p ( z i | β , σ 2 , δ , z ( - i ) , y , X ) ∝ I ( z i < δ σ y i - ∑ x i j β j σ 2 + δ 2 ) exp ( - 1 2 z i 2 )$which is the truncated normal distribution TN(0, 1) on $- ∞ ≤ z i ≤ ( δ / σ ) { ( y i - ∑ x i j β j ) / σ 2 + δ 2 }$.

Define β(−j) = (β1, . . . , βj−1, βj+1, . . . , βp). Then, for j = 1, . . . , p, the full conditional density of βj is given by $p ( β j | β ( - j ) , σ 2 , δ , z , y , X ) = T N ( μ β j , σ β j 2 )$subject to βj−1βjβj+1 with β0 = 0, where $μ β j = ( ∑ i = 1 n x i j 2 ) - 1 ∑ i = 1 n ( y i - ∑ l ≠ j x i l β l ) x i j$ and $σ β j 2 = ( σ 2 + δ 2 ) / ∑ i = 1 n x i j 2$.

Next, the full conditional density of σ2 is given by $p ( σ 2 | β , δ , z , y , X ) ∝ ( σ 2 + δ 2 ) - n 2 ( σ 2 ) - ( k + 1 ) exp ( - k σ 2 ) exp ( - 1 2 ( σ 2 + δ 2 ) ∑ i = 1 n ( y i - ∑ j = 1 p x i j β j ) 2 ) .$

Finally, the full conditional density of δ is given by $p ( σ | β , σ 2 , z , y , X ) ∝ ( σ 2 + δ 2 ) - n 2 exp ( - 1 2 ( σ 2 + δ 2 ) ∑ i = 1 n ( y i - ∑ j = 1 p x i j β j ) 2 - 1 2 σ δ 2 ( δ - μ δ ) 2 ) .$

It can be seen that the full conditional distributions in (3.8) and (3.9) are the truncated normal distibution. Therefore, zi’s and βi’s can be easily generated by Robert’s (1995) method. However, both full conditional distributions of σ2 and δ in (3.10) and (3.11) cannot be reduced analytically to well-known distributions. Therefore, it is impossible to sample directly by the standard methods. As suggested by Chib and Greenberg (1995), a hybrid MCMC algorithm is used by combined a Metropolis-Hastings sampling and Gibbs sampler using the suitable proposed distributions. The proposal distribution of σ2 is inverse gamma distribution with shape parameter n/2 and scale parameter $∑ i = 1 n ( y i - x i ′ β ) 2$ which is the same as full conditional density of σ2 under normal error model, where xi = (xi1, . . . , xip). Finally because the support of δ is the full set of real number, we set the proposal distribution of δ as normal distribution with mean δ(k–1) and variance $σ δ 2$. The hybrid MCMC algorithm is working as:

• Step 1: Start with initial point (z(0), β(0), $σ 2 ( 0 )$, δ(0)).

• Step 2: Set k = 1.

• Step 3: for i = 1, . . . , n, generate $z i ( k )$ from TN(0, 1) on $- ∞ ≤ z i ≤ ( δ / σ ) { ( y i - ∑ x i j β j ) / σ 2 + δ 2 }$ in (3.8).

• Step 4: for j = 1, . . . , p, generate $β j ( k )$ from $T N ( μ β j , σ β j 2 )$ subject to βj−1βjβj+1 in (3.9) where $μ β j = ( ∑ i = 1 n x i j 2 ) - 1 ∑ i = 1 n ( y i - ∑ l ≠ j x i l β l ) x i j$ and $σ β j 2 = ( σ 2 + δ 2 ) / ∑ i = 1 n x i j 2 .$.

• Step 5: Using Metropolis-Hastings algorithm, generate σ2(k) from $I G ( n / 2 , ∑ i = 1 n ( y i - x i ′ β ) 2 )$ as proposal distribution in (3.10).

• Step 6: Using Metropolis-Hastings algorithm, generate δ(k) from normal distribution $N ( δ ( k - 1 ) , σ δ 2 )$ as proposal distribution in (3.11).

• Step 7: k = k + 1.

• Step 8: Repeat Steps 3–7, T times.

Here, IG(a, b) denotes the inverse gamma distribution with shape parameter a and scale parameter b. Then, for example, the posterior mean of βj in (3.9) is approximated by $β ^ j = 1 K ( T - S ) ∑ k = 1 K ∑ i = S + 1 T β j k ( i ) ,$where S is the burn-in period and K is the number of parallel chains. Here, we decide the convergence to have been reached after S iterations of and MCMC algorithm have been performed on K multiple chains depending on Gelman-Rubin (1992)’s shrinkage factor. Then for each chain k, the observations $β j k ( 1 ) , β j k ( 2 ) , … , β j k ( S )$ are discarded and $β j k ( i )$, S + 1 ≤ iT are considered as an independent sample from the posterior distribution of the Markov chain, which is typically the posterior distribution.

### 3.3. Model comparisons

Generally, for testing M0: skew normal error model versus, M1: normal error model, Bayes factor (BF01) in favor of the model M0 (or against the other model M1) is used as: $BF 01 = f ( X | M 0 ) f ( X | M 1 ) ,$where f (X|Mi) = ∫ f (X|β, σ2, δ) f (β, σ2, δ|Mi)dβ2 is the marginal likelihood of the model Mi. Then, if BF01 > 1, then accept the model M0 based on the given data. Newton and Raftery (1994) suggested that the marginal likelihood may be estimated by the harmonic mean of the likelihoods of a sample from the posterior distribution using MCMC. For notational conveniences, let θ = (β, σ2, δ). Then, the Bayes factor can be estimated as: $BF ^ 01 = f ^ ( X | M 0 ) f ^ ( X | M 1 ) ,$where $f ^ ( X | M j ) = [ ( 1 / m ) ∑ i = 1 m f ( X | θ j ( i ) ) ( - 1 ) ] - 1$ and $θ j ( i )$ is the ith parameter samples from MCMC in model Mj, j = 0, 1.

It is natural to use criterion based on trade-off between the fit of the data to the model and the corresponding complexity of the model. Therefore, we compare the models based on DIC given by Spiegelhalter et al. (2002) for Bayesian model comparison: $DIC = D ¯ ( θ ) + p D ,$where D╠ä(θ) = E[D(θ)|y, X], pD = D╠ä(θ) – D(θ╠ä;), D(θ) = −2 log L(θ|y, X), and θ╠ä is a posterior mean. Here, pD is measure of complexity and the effective number of parameters. So, DIC is ‘goodness-of-fit’ plus ‘the effective number of parameters’. Therefore model with smaller DIC is better than the other model with larger DIC based on the data.

### 3.4. Influential observation

To see the effect of single observation d on the Bayes factor, Pettit and Young (1990) suggested Kd as: $K d = log BF 01 - log BF 01 ( d ) ,$where BF01 is defined in (3.12), $BF 01 ( d ) = f ( X ( d ) | M 0 ) / f ( X ( d ) | M 1 )$ is the Bayes factor without a single observation d and X(d) is a full data except a dth single observation. The larger values of |Kd| indicates that observation d has a large influence on the Bayes factor. This Kd is expressed as the difference in the logarithms of the conditional predictive ordinates (CPO) for the two models. That is, $K d = log f ( X | M 0 ) f ( X | M 1 ) - log f ( X ( d ) | M 0 ) f ( X ( d ) | M 1 ) = log f ( X | M 0 ) f ( X ( d ) | M 0 ) - log f ( X | M 1 ) f ( X ( d ) | M 1 ) .$

Pettit and Young (1990) suggested that an observation with |Kd| > 1/2 might be influential based on Jeffrey’s scale of evidence for assessing Bayes factors. If Kd in (3.15) is positive, it means that $log f ( X | M 0 ) f ( X ( d ) | M 0 ) > log f ( X | M 1 ) f ( X ( d ) | M 1 ) .$

Therefore, the difference between including and not including dth observation in the M0 is larger than that of M1. So, the observation with Kd > 1/2 supports M0 and the observation with Kd < −1/2 supports M1.

4. Real data analysis

The New Zealand Apple and Pear Marketing Board (NZAPB) is a statutory authority that trades and manages every contract in exporting New Zealand apples. Therefore, the more than 1,500 apple growers in New Zealand engage in the global community as one grower making international trade contracts. It is significant to realize that the data recorded consist of total harvest for each grower and for each apple variety. However, the NZAPB has also recorded the number of trees at each age, for each grower and for each variety. For the purpose of NZAPB, ages of trees vary from 2 to 11, where a tree of age “11” means 11 or older and is considered to be a full-grown tree. The year 1 is not included in the analysis because its production is close to zero. This is a set of 207 records, each record consisting of the total amount of fruit produced and the number of trees of each age. See Chen and Deely (1996) for more details of NZAPB’s apple data. It is reasonable that a linear regression model be used, where the quantity of fruit produced is regressed on the tree numbers at each age, beginning at physical age 2, for 10 years up to a full-grown tree. For notational convenience, we consider ages varying from 1 to 10, matching the association that age 1 is the physical age 2 and so on. Thus, we let, for i = 1, . . . , n = 207, $Y i = ∑ j = 1 10 β j x i j + ŽĄ i ,$where ŽĄi ~ SN(0, σ2, δ) in (2.2), xi j is number of trees at age j for grower i, βj equals average number of cartons produced by trees at age j for j = 1, . . . , 9, and β10 is average number of cartons produced by all full-grown trees. Furthermore, we note that βi represents the average over all trees of age i. Chen and Deely (1996) mentioned that since growers do not usually allow poor trees to persist, they proposed the constrained multiple linear regression with normal error as: $0 ≤ β 1 ≤ β 2 ≤ Ōŗ» ≤ β 10 ,$which are called monotone ordered constraints.

We investigated the robustness of the ordered multiple linear regression with skew normal model according to the different values of μδ and $σ δ 2$ in (3.5) based on the values of DICs in (3.14). Table 1 shows that our proposed model is robust based on the DIC values.

To compare the skew normal model to the original normal model informally, compute BF01 in (3.13), the effective number of parameters pD and the DIC in (3.14). Since $BF ^ 01 = 10.73$, we accept the skew normal error model. Table 2 shows that the skewed model improved the original normal model based on the values of DICs. In particular, for the normal model, the effective number of parameters pD is almost likely to the real number of parameters in the regression model.

Table 3 shows the Bayesian estimates of parameters of regressions, variation and skewness from rival models. All ten regression parameters βi are significant in all models since the corresponding highest posterior density (HPD) intervals do not include the value zero. The Bayesian estimate σ2 for the skew model is smaller than the normal model. This is expected since the variability and skewness are interchangeable to a certain extent. This means that the normal error model fails to control the skewness having larger variability. Table 2 shows that the result of Table 3 comes from the significance of the skew parameter. The skewness parameter δ is estimated to be positive in the skew normal model: it denotes that the given data fit to the right skewed normal mode mentioned previously. Moreover, δ is significant under the skew-normal model since the 95% HPD interval is (1.8743, 2.354). Thus, we can conclude that significant skewness is required to model the data.

In Figure 1, the plot Kd for normal error model and skew normal error model are displayed. There are 21 observations with |Kd| > 1/2. 14 observations support normal error model and 7 observations do skew normal error model. So, there are lots of the observations supporting normal error model.

5. Conclusions

In this paper, we presented a Bayesian ordered multiple linear regression with skew normal errors. For the case for hard to compute full conditional density, we use data augmentation method and expand the observed likelihood. Thus, we easily get estimates of parameters by MCMC. We show that the associated posterior density based on our Bayesian skew normal model is proper under the improper priors. That is, the marginal posterior density is finite.

We can detect which observation support normal error or skew normal error using the Kd measure. In the next study, we may consider the extension of the Kd measure in excluding two or more influence observations to check which group of observations are supported in some model.

Finally, we apply our model to data that has various types of $σ δ 2$ constraint and an other flexible error-terms model.

Appendix: Proof of Theorem 1

We first consider the integral of the likelihood in (3.2) times the prior in (3.5). Let $A = ∫ Ōŗ» ∫ L ( β , σ 2 , δ | y , X ) π 1 ( β ) π 2 ( σ 2 ) π 3 ( δ ) d β d σ 2 d δ .$

It is sufficient to show that A is finite. In the following derivation, the value of the constant C may not be the same from line to line : $A = C ∫ Ōŗ» ∫ [ ∏ i = 1 n 2 σ 2 + δ 2 φ ( y i - ∑ j = 1 p x i j β j σ 2 + δ 2 ) Φ ( δ σ y i - ∑ j = 1 p x i j β j σ 2 + δ 2 ) ] × π 1 ( β ) π 2 ( σ 2 ) π 3 ( δ ) d β d σ 2 d δ ≤ C ∫ ∫ ( σ 2 + δ 2 ) - n 2 [ ∏ i = 1 p ∫ β i - 1 β i + 1 exp ( - 1 2 ( σ 2 + δ 2 ) ( y i - ∑ j = 1 p x i j β j ) 2 ) d β i ] × π 2 ( σ 2 ) π 3 ( δ ) d σ 2 d δ ≤ C ∫ ∫ ( σ 2 + δ 2 ) - n - p 2 π 2 ( σ 2 ) π 3 ( δ ) d σ 2 d δ ≤ C ∫ ∫ ( σ 2 ) - n - p 2 π 2 ( σ 2 ) π 3 ( δ ) d σ 2 d δ ≤ C ∫ [ ∫ ( σ 2 ) - n - p 2 + k + 1 exp ( - k σ 2 ) d σ 2 ] π 3 ( δ ) d δ$

Then, the inner integral in the bracket of the equation (A.2) with respect σ2 is finite if n > p. Therefore A is obviously finite since the prior π3(δ) of δ is assumed to be normal density.

Figures Fig. 1.

Kd plot.

TABLES

### Table 1

DICs for values of μδ and $σ δ 2$

μδ $σ δ 2$
1 10 100
−1.0 2832.941 2832.997 2833.964
−0.5 2832.924 2832.906 2833.867
0.0 2832.792 2832.865 2833.693
0.5 2832.713 2832.799 2833.547
1.0 2832.670 2832.683 2833.444

DIC = deviance information criterion.

### Table 2

Model comparison for apple data

Model D╠ä(θ) pD DIC
Normal error 2831.803 11.1669 2822.9699
Skew normal error 2731.566 51.1039 2782.6699

DIC = deviance information criterion.

### Table 3

Normal error and skew normal error estimation

Parameter Normal error Skew normal error

Estimate Standard error Estimate Standard error
β1 0.01374 0.00806 0.01723 0.00019
β2 0.02496 0.00820 0.01723 0.00019
β3 0.17749 0.01113 0.18122 0.00016
β4 0.31129 0.04111 0.29833 0.00158
β5 0.55423 0.07618 0.55195 0.00601
β6 0.78064 0.03463 0.81104 0.00126
β7 0.81682 0.04071 0.81131 0.00129
β8 0.93711 0.09977 0.86252 0.04474
β9 1.04100 0.11945 0.89909 0.05384
β10 1.22821 0.17487 1.29236 0.28248

σ2 53469.3 5406.4 51267.8 5002.1

δ · · 2.03320 0.08099

References
1. Azzalini A (1985). A class of distributions which includes the normal ones, Scandinavian Journal of Statistics, 12, 171-178.
2. Azzalini A and Capitanio A (1999). Statistical applications of the multivariate skew-normal distributions, Journal of the Royal Statistical Society Series B (Statistical Methodology), 61, 579-602.
3. Azzalini A and Dalla-Valle A (1996). The multivariate skew-normal distribution, Biometrika, 83, 715-726.
4. Chen MH and Deely JJ (1996). Bayesian analysis for a constrained linear multiple regression problem for predicting the new crop of apples, Journal of Agricultural, Biological, and Environmental Statistics, 1, 467-489.
5. Chib B and Greenberg E (1995). Understanding the metropolis-hastings algorithm, The American Statistician, 49, 327-335.
6. Gelman A and Rubin DB (1992). Inference from iterative simulation using multiple sequences, Statistical Science, 7, 457-472.
7. Jang J, Chung Y, Kim C, and Song S (2009). Bayesian meta-analysis using skewed elliptical distributions, Journal of Statistical Computation and Simulation, 79, 691-704.
8. Jung M, Song S, and Chung Y (2018). Bayesian change-point problem using Bayes factor with hierarchical prior distribution, Communications in Statistics - Theory and Methods, 46, 1352-1366.
9. Newton MA and Raftery AE (1994). Approximate Bayesian inference with the weighted likelihood bootstrap (with discussion), Journal of the Royal Statistical Society. Series B (Methodology), 56, 3-48.
10. O’Hagan A and Leonard T (1976). Bayes estimation subject to uncertainty about parameter constraints, Biometrika, 63, 201-203.
11. Pettit LI and Young KDS (1990). Measuring the effect of observations on Bayes factors, Biometrika, 77, 455-466.
12. Robert CP (1995). Simulation of truncated normal variables, Statistics and Computing, 5, 121-125.
13. Sahu SK, Dey DK, and Branco MD (2003). A new class of multivariate skew distributions with applications to Bayesian regression models, The Canadian Journal of Statistics, 31, 129-150.
14. Spiegelhalter DJ, Best NG, Carlin BP, and Van der Linde A (2002). Bayesian measures of model complexity and fit, Journal of the Royal Statistical Society Series B (Statistical Methodology), 64, 583-639.
15. Tanner MA and Wong WH (1987). The calculation of posterior distributions by data augmentation, Journal of the American Statistical Association, 82, 528-540.