TEXT SIZE

CrossRef (0)
A Kullback-Leibler divergence based comparison of approximate Bayesian estimations of ARMA models

Ayman A Amin1,a

aDepartment of Statistics, Mathematics, and Insurance, Faculty of Commerce, Menoufia University, Egypt
Correspondence to: 1 Department of Statistics, Mathematics, and Insurance, Faculty of Commerce, Menoufia University, Shibin El Kom, Al Minufya, Al Minufya, Egypt. E-mail: ayman.aamin@gmail.com
Received February 4, 2022; Revised April 1, 2022; Accepted April 6, 2022.
Abstract
Autoregressive moving average (ARMA) models involve nonlinearity in the model coecients because of unobserved lagged errors, which complicates the likelihood function and makes the posterior density analytically intractable. In order to overcome this problem of posterior analysis, some approximation methods have been proposed in literature. In this paper we first review the main analytic approximations proposed to approximate the posterior density of ARMA models to be analytically tractable, which include Newbold, Zellner-Reynolds, and Broemeling-Shaarawy approximations. We then use the Kullback-Leibler divergence to study the relation between these three analytic approximations and to measure the distance between their derived approximate posteriors for ARMA models. In addition, we evaluate the impact of the approximate posteriors distance in Bayesian estimates of mean and precision of the model coecients by generating a large number of Monte Carlo simulations from the approximate posteriors. Simulation study results show that the approximate posteriors of Newbold and Zellner-Reynolds are very close to each other, and their estimates have higher precision compared to those of Broemeling-Shaarawy approximation. Same results are obtained from the application to real-world time series datasets.
Keywords : approximate posteriors distance, Kullback-Leibler calibration, multivariate t distribution, Jeffreys’ prior, natural conjugate prior
1. Introduction

Autoregressive moving average (ARMA) modeling of time series has been successfully applied in different fields such as economics, finance, and engineering. Bayesian analysis of ARMA models is difficult due to the nonlinearity in the coefficients of moving average (MA) part, which complicates the likelihood function and makes the posterior density analytically intractable (Amin, 2019a). Accordingly, the exact posterior analysis of ARMA models requires the use of numerical integration, which is computationally expensive. In order to overcome this problem, three well-known analytic approximations were proposed by Newbold (1973), Zellner and Reynolds (1978), and Broemeling and Shaarawy (1984, 1988) to approximate the posterior density to be an analytically tractable closed-form distribution.

Newbold (1973) proposed the first analytic approximation by expanding unobserved errors as a linear function in the model coefficients using the first order Taylor’s expansion. In addition, Zellner and Reynolds (1978) proposed the second analytic approximation, denoted by Zellner-Reynolds, by expanding the errors sum of squares, rather than the errors, of ARMA model as a quadratic function in the model cofficients using the second order Taylor’s expansion. Moreover, Broemeling and Shaarawy (1984, 1988) proposed the third analytic approximation, denoted by Broemeling-Shaarawy, by simply replacing unobserved lagged errors with their estimates to linearize the errors as a function in the model coefficients.

Because of the simplicity of Broemeling-Shaarawy analytic approximation compared to other proposed approximations, it has been widely extended to the Bayesian analysis of many time series models such as seasonal ARMA models (Amin, 2009; Ismail and Amin, 2014), multivariate MA models (Shaarawy and Ali, 2012), double seasonal MA models (Amin, 2017b), and double seasonal ARMA models (Amin, 2017a, 2018). However, few work have been introduced to evaluate and compare the accuracy of these three analytic approximations. Ismail (1994) used a small scale of simulation study to investigate the accuracy of the three approximations in the case of MA model of order one, and Ali (1998) and Soliman (1999) extended his work to the MA model of order two and to ARMA and seasonal ARMA of order one, respectively. Therefore, there is a need in the Bayesian time series analysis to study how these analytic approximations are mathematically related and to comprehensively evaluate their accuracy, aiming to help researchers to make a trade-off between the simplicity and accuracy of these approximations. As a motivation to this work, the outcome of these analytic approximations for complicated time series models can be used as inputs to advanced methods such as Markov chain Monte Carlo (MCMC) methods (Amin and Ismail, 2015; Amin, 2020, 2022). For example, in the case of seasonal ARMA models where the posterior density is nonlinear function in the model coefficients and thus analytically intractable. So, these analytic approximations can be used to approximate the posterior density, which in turn is used to derive the full conditional posteriors as a main requirement to apply MCMC methods to introduce the Bayesian estimation and prediction for the underlying seasonal ARMA models.

In order to fill this gap, we first review these three analytic approximations and then we use the Kullback-Leibler (KL) divergence (Kullback and Leibler, 1951) to study mathematically the relation between them and to measure the distance between their derived approximate posteriors for ARMA models. In addition, we evaluate the impact of the approximate posteriors distance in the Bayesian estimates of the model coefficients and their precision by generating a large number of Monte Carlo simulations from the approximate posteriors. Therefore, we can summarize our work in this paper as follows. First, we derive the KL divergence between Newbold, Zellner-Reynolds, and Broemeling-Shaarawy approximate posteriors and compute its calibration to study mathematically the relation between them and to measure their distances. Second, we use a large number of Monte Carlo simulations from the approximate posteriors to evaluate the impact of the posteriors distance in the Bayesian estimates of mean and precision of the model coefficients. Finally, we use real-world time series datasets to illustrate the use of the KL divergence to measure the distance between the approximate posteriors and show the impact of this distance in the Bayesian estimates.

The remainder of this paper is organized as follows. In Section 2 we present the background of the ARMA models and related Bayesian concepts, and in Section 3 we summarize the analytic approximations. In Section 4 we derive the KL divergence between the approximate posteriors and its calibration. In Section 5 we present the simulation study and real-world time series datasets to measure the distance between the approximate posteriors and evaluate the impact of this distance in the Bayesian estimates. Finally, we give the conclusions in Section 6.

2. ARMA models and related Bayesian concepts

Time series {yt} can be modeled by an autoregressive moving average (ARMA) model of order (p, q), simply denoted by ARMA(p, q), and written as (Box et al., 2015):

φp(B)yt=θq(B)ɛt,

where φp(B) = (1 – φ1Bφ2B2 – · · · – φpBp) and θq(B) = (1 + θ1B + θ2B2 + · · · + θqBq) are the autoregressive and moving average polynomials with orders p and q respectively, and B is a backshift operator defined as Bkzt = ztk. {ɛt} is a sequence of independent normal errors with zero mean and variance σ2 < ∞. The ARMA model (2.1) is stationary if all the roots of φp(B) = 0 lie outside the unit circle, and if all the roots of θq(B) = 0 lie outside the unit circle this model is invertibile. For more details about the stationarity and invertibility properties of time series models see (Box et al., 2015). The model (2.1) can be simplified as:

yt=i=1pφiyt-i+i=1qθiɛt-i+ɛt=Xtβ+ɛt,

where Xt = (yt–1, . . ., ytp, ɛt–1, . . ., ɛtq) is the tth row of the design matrix X of order n × m, i.e. m = p + q, and β = (φ1, . . ., φp, θ1, . . ., θq)T is the model coefficients.

For ARMA model with normally distributed errors, the likelihood function is given by

L(β,σ2y)(σ2)-n2exp {-12σ2(y-Xβ)T(y-Xβ)}(σ2)-n2exp {-Qn2σ2},

where Qn = ɛTɛ is the errors sum of squares, and the natural conjugate prior is normal-gamma distribution. So, suppose β ~ Nm(μβ, σ2β) and σ2 ~ IG(ν/2, λ/2), the joint natural conjugate prior distribution of β and σ2 can be written as:

ζn(β,σ2)(σ2)-(ν+m2+1)exp {-12σ2[λ+(β-μβ)TΣβ-1(β-μβ)]},

where μβ, ∑β, ν and λ are hyperparameters need to be estimated.

In case of little or no information is available about the model parameters, Jeffreys’ prior can be employed, and it is given as

ζj(β,σ2)(σ2)-1,         σ2>0.

Multiplying the likelihood function (2.3) by each one of these two joint prior distributions results in the following joint posteriors. For the natural conjugate prior, the joint posterior of β and σ2 is:

ζn(β,σ2y)(σ2)-(n+ν+m2+1)exp {-12σ2[λ+(β-μβ)TΣβ-1(β-μβ)+Qn]}.

For Jeffreys’ prior, the joint posterior of β and σ2 is:

ζj(β,σ2y)(σ2)-(n2+1)exp {-Qn2σ2}.

It is worth observing that the unknown lagged errors are part of the design matrices X which complicates the likelihood function and makes the joint posteriors of β and σ2 analytically intractable. As a result, analytic approximations were introduced to approximate these joint posteriors as we review in the following section.

3. Analytic approximations

### 3.1. Newbold approximation

Newbold (1973) expanded the errors as a linear function in the model coefficients β around their non-linear least squares estimates (NLSE) by using the first order Taylor’s expansion. So, the approximate errors for Newbold (ɛtN) can be calculated using the following formula:

ɛtN=ɛ^t+Ut(β-β^),

where ε̂t are the errors estimated recursively using NLSE β̂ of the model coefficients as:

ɛ^t=yt-X^tβ^

and t is the vector of the explanatory variables (yt–1, yt–2, . . ., ytp, ε̂t–1, ε̂t–2, . . ., ε̂tq) after replacing the error terms by their NLSE’s. Ut is a derivative vector that can be defined as:

Ut=(U1t,U2t,,Umt)=(ɛtβ1ɛtβ2,,ɛtβm)|β=β^.

Accordingly, the Newbold approximate errors sum of squares QnN can be obtained as:

QnN=Q^n+(β-β^)TUTU(β-β^),

where n = ε̂Tε̂ and ε̂t is computed as in Equation (3.2), and U is a matrix with the tth row is Ut defined in Equation (3.3). Therefore, the Newbold approximate likelihood function is given by

Ln(β,σ2y)(σ2)-n2exp {-12σ2[Q^n+(β-β^)TUTU(β-β^)]}

and the Newbold approximate joint posterior of β and σ2, in case of natural conjugate prior, can be obtained as:

ζn(β,σ2y)(σ2)-(n+ν+m2+1)exp {-12σ2[λ+(β-μβ)TΣβ-1(β-μβ)+Q^n+(β-β^)TUTU(β-β^)]}.

Using some mathematical manipulations, we can prove that the marginal approximate posterior of the model coefficients β is a multivariate t distribution with degrees of freedom vn = (n + ν) and location vector and dispersion matrix are respectively:

μnN=An-1Bn         and         VnN=1vn-1An-1[β^TUTUβ^+Q^n+λ+μβTΣβ-1μβ-BnTAn-1Bn],

where An-1=(UTU+Σβ-1)-1 and Bn=(UTUβ^+Σβ-1μβ). By setting λ = 0, Σβ-1=0, and ν = −m, we can obtain the Newbold approximate marginal posterior of β in the case of Jeffreys’ prior as a multivariate t distribution with degrees of freedom vj = (nm) and location vector and dispersion matrix are respectively:

μjN=β^         and         VjN=Q^nvj-2(UTU)-1.

### 3.2. Zellner-Reynolds approximation

Zellner and Reynolds (1978) expanded the errors sum of squares Qn, instead of the errors, as a quadratic function in the model coefficients using the second order of Taylor’s expansion. The Zellner- Reynolds approximation of the errors sum of squares can be calculated as:

QnZR=Q^n+12(β-β^)TR(β-β^),

where R is an m × m matrix of the second derivatives that can be defined as:

R=[2Qnβ12Qn2β1β2Qn2β1βmQn2β2β12Qnβ22Qn2β2βmQn2βmβ12Qnβmβ2Qn2βm2].

Accordingly, the Zellner-Reynolds approximate likelihood function is given by

LZK(β,σ2y)(σ2)-n2exp {-12σ2[Q^n+12(β-β^)TR(β-β^)]}

and by employing the natural conjugate prior we can obtain the approximate joint posterior of β and σ2 as:

ζnN(β,σ2y)(σ2)-(n+ν+m2+1)exp {-12σ2[λ+(β-μβ)TΣβ-1(β-μβ)+Q^n+12(β-β^)TR(β-β^)]}.

With some manipulations we can simply derive the marginal approximate posterior of the model coefficients β as a multivariate t distribution with degrees of freedom vn = (n + ν) and location vector and dispersion matrix are respectively:

μnZR=Az-1Bz         and         VnZR=1vn-2Az-1[12β^TRβ^+Q^n+λ+μβTΣβ-1μβ-BzTAz-1Bz],

where Az-1=((1/2)R+Σβ-1)-1 and Bz=((1/2)Rβ^+Σβ-1μβ). By setting λ = 0, Σβ-1=0, and ν = −m, we can obtain the approximate marginal posterior of β in the case of Jeffreys’ prior as a multivariate t distribution with degrees of freedom vj = (nm) and location vector and dispersion matrix are respectively:

μjZR=β^         and         VjZR=Q^nvj-2(12R)-1.

### 3.3. Broemeling-Shaarawy approximation

Broemeling and Shaarawy (1984, 1988) approximated the unobserved errors by simply replacing the unobserved lagged errors with their NLSE’s to be a linear function in the model coefficients as follows:

ɛtBS=yt-X^tβ=ɛ^t-X^t(β-β^),

where t is tth row of the design matrix after replacing the unobserved lagged error terms by their NLSE’s. Therefore, the Broemeling-Shaarawy approximate errors sum of squares QnBS can be obtained as:

QnBS=Q^n+(β-β^)TX^TX^(β-β^)-2(β-β^)TX^Tɛ^

and the approximate likelihood function is given by

LBS(β,σ2y)(σ2)-n2exp {-12σ2[Q^n+(β-β^)TX^TX^(β-β^)-2(β-β^)TX^Tɛ^]}

and the Broemeling-Shaarawy approximate joint posterior of β and σ2, in case of natural conjugate prior, can be obtained as:

ζnBS(β,σ2y)(σ2)-(n+ν+m2+1)exp {-12σ2[λ+(β-μβ)TΣβ-1(β-μβ)+Q^n+(β-β)TX^TX^(β-β^)-2(β-β^)TX^Tɛ^]}.

Using some manipulations, we can derive the marginal approximate posterior of the model coefficients β to be a multivariate t distribution with degrees of freedom vn = (n + ν) and location vector and dispersion matrix are respectively:

μnBS=Ab-1Bb         and         VnBS=1vn-2Ab-1[yTy+λ+μBTΣβ-1μβ-BbTAb-1Bb],

where Ab-1=(X^TX^+Σβ-1)-1 and Bb=(X^Ty+Σβ-1μβ). By setting λ = 0, Σβ-1=0, and ν = −m, the approximate marginal posterior of β in the case of Jeffreys’ prior can be obtained as a multivariate t distribution with degrees of freedom vj = (nm) and location vector and dispersion matrix are respectively:

μjBS=(X^TX^)-1X^Ty         and         VjBS=Q^nvj-2(X^TX^)-1.
4. Kullback-Leibler divergence of approximate posteriors

The Kullback-Leibler (KL) divergence (Kullback and Leibler, 1951) can be used to measure the distance between any two of the approximate posteriors of the ARMA model coefficients introduced in Section 3. To explain the idea, suppose we have two multivariate t posteriors for β: the first is ζ1(β | y) with parameters mean μ1, dispersion matrix V1, and degrees of freedom v1, and the second is ζ2(β | y) with parameters mean μ2, dispersion matrix V2, and degrees of freedom v2. The KL divergence between these two posteriors can be written as:

KL [ζ1(βy),ζ2(βy)]=ln (ζ1(βy)ζ2(βy))ζ1(βy)dβ.

This KL divergence is not symmetric and always non-negative, and it equals to zero when the two posteriors are the same. This implies that the smaller its value, the closer are the two posteriors. We can simplify the KL divergence (4.1) as:

KL [ζ1(βy),ζ2(βy)]=[-ln (ζ2(βy))]ζ1(βy)dβ-[-ln (ζ1(βy))]ζ1(βy)dβ=CH [ζ1(βy),ζ2(βy)]-H[ζ1(βy)],

where CH[ζ1(β | y), (ζ2(β | y)] and H [ζ1(β | y)] are known as the cross-entropy and Shannon entropy respectively (Amin, 2017c). In our previous work (Amin, 2019b), we derived the approximate KL divergence between the two multivariate t posteriors ζ1(β | y) and ζ2(β | y) as:

KL [ζ1(βy),ζ2(βy)]12log (V2V1)+12(v2+mv2)[(v1v1-2)tr(V2-1V1)+(μ1-μ2)TV2-1(μ1-μ2)]-m2.

This derived approximate KL divergence is not symmetric; however, we can compute the symmetric distance KL* [ζ1(β | y), ζ2(β | y)] as the average of two KL divergences as follows:

KL*[ζ1(βy),ζ2(βy)]=12{KL [ζ1(βy),ζ2(βy)]+KL [ζ2(βy),ζ1(βy)]}14{tr(V1*-1V2+V2*-1V1)+(μ1-μ2)TV-1(μ1-μ2)-2m},

where V1*=((1-2/v2)/(1+m/v1))V1,V2*=((1-2/v1)/(1+m/v2))V2, and V-1=(1+m/v1)V1-1+(1+m/v2)V2-1.

Now, we can use this derived approximate KL divergence in Equation (4.4) to study the relation between the analytic approximations summarized in Section 3 and to measure the distance between their approximate posteriors. To do that we have to observe two things. First, the derived approximate KL divergence depends on the difference between the approximate posteriors means and the ratio of the posteriors dispersions. Second, all the marginal approximate posteriors of β derived in Section 3 are multivariate t distribution with the same degrees of freedom but with different location vector and dispersion matrix. Aiming to simplify the comparison, we compute the KL divergence between the marginal approximate posteriors of β in the case of employong Jeffreys’ prior.

Using the location vectors and dispersion matrices given in Equations (3.8) and (3.14), we can simplify the KL divergence between the Newbold and Zellner-Reynolds approximate posteriors of β as:

KL*[ζjN(βy),ζjZR(βy)]14{nn-m-2tr(UTU(12R)-1+12R(UTU)-1)-2m},

which indicates that the distance between the Newbold and Zellner-Reynolds approximate posteriors depends mainly on the relation between the two matrices UTU and (1/2)R. To investigate this relation, we can redefine the R matrix in Equation (3.10) by letting its i jth element to be defined as:

Rij=d2Qdβidβj=2{t=1n(ɛt2ɛtβiβj)+t=1n(ɛtβi×ɛtβj)}=2{Dij+(UTU)ij},

which means (1/2)R = D + UTU and implies that UTU is a part of (1/2)R. Therefore, the distance between the Newbold and Zellner-Reynolds approximate posteriors is expected to be very small, i.e. they are very close to each other, especially when the matrix D closes to 0, which is evaluated by the simulation study in the next section. Similarly, we use the location vectors and dispersion matrices given in Equations (3.8) and (3.20) to simplify the KL divergence between the Newbold and Broemeling-Shaarawy approximate posteriors of β as:

KL*[ζjN(βy),ζjBS(βy)]14{nn-m-2tr(X^TX^(UTU)-1+UTU(X^TX^)-1)+n(n-m-2)(n-m)Q^n(β^-μjBS)T[X^TX^+UTU](β^-μjBS)-2m},

which depends on the difference between the two approximate posteriors means and the ratio of the two matrices T and UTU. Since the matrix U contains more information about the unobserved errors compared to the matrix T, it is expected that the distance between the Newbold and Broemeling- Shaarawy approximate posteriors is very large, i.e. they are strongly different, which we evaluate by the simulation study in the next section. We can obtain similar conclusion about the distance between the Zellner-Reynolds and Broemeling-Shaarawy approximate posteriors, since their KL divergence can be simplified as:

KL*[ζjZR(βy),ζjBS(βy)]14{nn-m-2tr(X^TX^(12R)-1+12R(X^TX^)-1)+n(n-m-2)(n-m)Q^n(β^-μjBS)T[X^TX^+12R](β^-μjBS)-2m}.

It has to be noted that possible values of the approximate KL divergence are non-negative with no maximum limit, however, we can calibrate the values to be in the interval (0.5,1.0) to be able to decide about the distance between the approximate posteriors. When the calibration value closes to 0.5, it implies the two approximate posteriors are almost the same; and when its value closes to 1.0, the two approximate posteriors are strongly different. Suppose KL[ζ1(β | y), ζ2(β | y)] = k, using the idea of calibration proposed by McCulloch (1989) we can compute the calibration of KL divergence (KLC) between the two approximate posteriors as:

KLC [ζ1(βy),ζ2(βy)]=12{1-exp(-2KL [ζ1(βy),ζ2(βy)])}.

All the derived approximate KL divergences of the approximate posteriors and their calibration will be used in the next section to measure the distance between these posteriors and the impact of this distance in the Bayesian estimates using simulated and real-world time series datasets.

5. Simulation study and real application

We have two parts of work in this section. First, we present a simulation study to evaluate the KL divergence (and its calibration) between the considered approximate posteriors of the ARMA model coefficients and evaluate the impact of posteriors divergence in the Bayesian estimates. Second, we present two applications of our work to real world time series datasets.

### 5.1. Simulation study

In this simulation study, we have two objectives: (1) evaluating the KL divergence (and its calibration) between the considered approximate posteriors of the ARMA model coefficients, and (2) evaluating the impact of this posteriors divergence in the Bayesian estimates for several simulated time series data with different sample sizes, different ARMA model orders, and different values of the ARMA model coefficients. In the following we discuss the simulation study design and results for each objective.

5.1.1. Evaluating the Kullback-Leibler divergence and its calibration

In order to evaluate the KL divergence (and its calibration) between the considered three approximate posteriors of the ARMA model coefficients, we generate 1,000 time series of size n (from 50 to 300 with an increment value of 50) from different models: MA(1) with θ = 0.4 and 0.7, MA(2) with (θ1, θ2) = (0.4, 0.3) and (0.3, 0.6), ARMA(1, 1) with (φ, θ) = (0.5, 0.3) and (0.4, 0.6), ARMA(1, 2) with (φ, θ1, θ2) = (0.4, 0.5, 0.4) and (0.7, 0.3, 0.4), and ARMA(2, 2) with (φ1, φ2, θ1, θ2) = (0.4, –0.3, 0.5, 0.4) and (1.3, –0.6, 0.3, 0.6); and in all these models we set the model variance σ2 = 1. For each time series, we compute the location vectors and dispersion matrices given in Equations (3.8), (3.14) and (3.20) for the Newbold, Zellner-Reynolds and Broemeling-Shaarawy approximate posteriors of the ARMA model coefficients β. We then use these location vectors and dispersion matrices to compute the KL divergence and its calibration between the corresponding approximate posteriors. The average of the KL divergence and its calibration are computed and presented in

From these simulation results, we can observe some important remarks.

• First, the KL divergence between the Newbold and Zellner-Reynolds approximate posteriors is very small especially for simple models and large sample size, for example their KL calibration values are between 0.57 and 0.71 in most of the cases for n = 300. This confirms that the Newbold and Zellner-Reynolds approximate posteriors are very close to each other. On contrary, the KL divergence between the Newbold and Broemeling-Shaarawy approximate posteriors is very large, since the KL calibration values are between 0.77 and 0.97 in most of the cases for n = 300, which confirms that these approximate posteriors are strongly different. Similar conclusion can be drawn about Zellner-Reynolds and Broemeling-Shaarawy approximate posteriors.

• Second, when the sample size increases the approximate posteriors divergences decreases, since more information are provided by the observed time series enabling the analytic approximations to reduce the uncertainty about the model parameters. For example, in case of ARMA(1, 1) with (φ, θ) = (0.5, 0.3), the KLC between the Newbold and Zellner-Reynolds posteriors is about 0.74 for n = 50 and it becomes about 0.60 for n = 300; and the KLC between the Newbold and Broemeling- Shaarawy posteriors is about 0.81 for n = 50 and it is reduced to be about 0.72 for n = 500.

• Third, the more complicated ARMA model the larger KL divergence between the approximate posteriors. For example, in case of n = 100, the KLC between the Newbold and Zellner-Reynolds and between the Newbold and Broemeling-Shaarawy posteriors are about 0.61 and 0.70 respectively for MA(1) with θ = 0.4 and they are increased to be about 0.82 and 0.93 respectively for ARMA(2,2) with (φ1, φ2, θ1, θ2) = (0.4, –0.3, 0.5, 0.4).

5.1.2. Evaluating the impact of posteriors distance in Bayesian estimates

In order to evaluate the distance impact of the Newbold, Zellner-Reynolds and Broemeling-Shaarawy approximate posteriors in the Bayesian estimates of the ARMA model coefficients β, we use the Monte Carlo simulations to obtain samples of β from these approximate posteriors as follows.

• First, we generate 1,000 time series of size n (from 50 to 300 with an increment value of 50) from different ARMA models: MA(1) with θ = 0.4 and 0.7, MA(2) with (θ1, θ2) = (0.3, 0.6), and ARMA(1, 1) with (φ, θ) = (0.3, –0.7); and in all these models we set the model variance σ2 = 1.

• Second, for each time series, we generate 1,000 values of β from the Newbold, Zellner-Reynolds and Broemeling-Shaarawy marginal approximate posteriors of β.

• Third, for each Monte Carlo simulation chain of β, we compute the Bayesian estimates of β which include mean, standard deviation, and quantiles 2.5% and 97.5% as an 95% credible interval.

• Fourth, we compute the mean absolute percentage errors (MAPE) to compare the Bayesian estimates of β, obtained by simulations from different approximate posteriors, to the true value used to generate the underlying time series.

• Fifth, we evaluate the variability of β obtained from each of the approximate posteriors by calculating the ratio of standard deviations of β from different posteriors, for each generated chain; and then we report the mean and quantiles 2.5% and 97.5% of these ratios. Results of these simulations are presented in Tables 611 and discussed in the following.

Results of the Bayesian estimates of the MA(1) coefficients are presented in Table 6 and their comparison criteria are presented in Table 7. From these results we can observe that the Bayesian estimates of the coefficients mean obtained from all the approximate posteriors are almost the same, however, the standard deviation estimates obtained from the Newbold and Zellner-Reynolds approximate posteriors are strongly different from those obtained form the Broemeling-Shaarawy approximate posterior. For example, when n = 50 and θ = 0.7, all the approximate posteriors give a mean estimate of about 0.72, and the standard deviation estimates are 0.010, 0.011, and 0.020 obtained from the Newbold, Zellner-Reynolds and Broemeling-Shaarawy approximate posteriors, respectively.

The results of the comparison criteria show that the Newbold and Zellner-Reynolds approximate posteriors have almost the same MAPE values that are relatively less than those of the Broemeling- Shaarawy posterior. In addition, the ratios of standard deviations show that the standard deviation estimates obtained from the Broemeling-Shaarawy approximate posterior are relatively very large compared to those obtained from the Newbold and Zellner-Reynolds posteriors, which are very close. We get the same conclusions from the results of MA(2) model presented in Tables 8 and 9 and ARMA(1, 1) model presented in Tables 10 and 11 with some additional general remarks. First, when the sample size is getting larger, the Bayesian estimates become more accurate, and hence the MAPE values and the ratios of standard deviation estimates are highly reduced. Second, the larger coefficients values used to generate the time series the more accurate estimates obtained from the approximate posteriors. In general, the simulation results confirm that the Newbold and Zellner-Reynolds approximate posteriors are strongly different from the Broemeling-Shaarawy approximate posterior, and the impact of that can be observed in the Bayesian estimates of the coefficients standard deviation. Thus, the Newbold and Zellner-Reynolds approximations reduce the posterior estimate of the coefficients standard deviation by at least 17% in the case of simple models to 80% in the case of complicated models compared to the Broemeling-Shaarawy approximation, which is strongly reflected in the 95% credible intervals of the coefficients.

### 5.2. Application to real-world time series

We introduce two real-world time series to demonstrate how the KL divergence can be used in reality to measure the distance between the derived approximate posteriors and to show the impact of this distance in the Bayesian estimates of the model coefficients. These two real-world time series datasets represent viscosity and concentration outputs from two different chemical processes (Box et al., 2015). These datasets were collected on full-scale processes to show the effect on these outputs of uncontrolled disturbances such as variations in feedstock. These two time series datasets are presented in

It can observe from Figure 1 that these two time series are not stationary, so we use the first difference to stationarize them; and we then identify the best suitable order of the ARMA model for the differenced time series datasets by using the corrected Akaike’s information criterion (AICc) with assuming the maximum order is five for both autoregressive and moving average parts. We compute the AICc for all the combinations of ARMA models and select the best model with smallest AICc value, and accordingly we identify the best model for both the differenced datasets is ARMA(1, 1). Using the identified models for these two time series datasets and employing the Jeffreys’ priors, we first obtain the Newbold, Zellner-Reynolds and Broemeling-Shaarawy approximate posteriors of the model coefficients. Second, we compute the KL divergence and its calibration between these approximate posteriors and present their results in Table 12. Finally, we compute the Bayesian estimates for the coefficients of these models and present results in Table 13. It is worth noting that these results are consistent with our conclusions in the simulation study in previous subsection.

6. Conclusions

In this paper we first reviewed the Newbold, Zellner-Reynolds, and Broemeling-Shaarawy analytic approximations proposed in the literature to approximate the posterior density of ARMA models to be analytically tractable. We then derived the KL divergence between these approximate posteriors and compute its calibration to measure the distance between these posteriors. We used a large number of Monte Carlo simulations from the approximate posteriors to evaluate the impact of the posteriors distance in the Bayesian estimates of mean and precision of the model coefficients. Simulation results confirmed that the Newbold and Zellner-Reynolds approximate posteriors are very close to each other, and they are strongly different from the Broemeling-Shaarawy approximate posterior. In particular, the results demonstrated that the Newbold and Zellner-Reynolds approximations show higher precision of the model coefficients compared to the Broemeling-Shaarawy approximation, which is reflected in the coefficients credible intervals. We applied our work to real-world time series datasets and their results are consistent with those of the simulation study.

Figures
Fig. 1. Time-plot of the real-world time series datasets.
TABLES

### Table 1

Results of KL divergence and its calibration (KLC) for MA(1)

θSample size (n)

50100150200250300
0.4KL[N,ZR]*0.0670.0300.0190.0140.0110.009
KLC[N,ZR]0.6610.6140.5930.5800.5710.565

KL[N,BS]0.3730.1360.1230.1110.1050.102
KLC[N,BS]0.7250.7030.6950.6890.6850.683

KL[ZR,BS]0.4650.1390.1270.1160.1080.105
KLC[ZR,BS]0.7380.7080.6990.6920.6860.684

0.7KL[N,ZR]0.0980.1510.0270.0210.0160.013
KLC[N,ZR]0.6790.6280.6040.5920.5820.574

KL[N,BS]1.2440.7030.5460.5540.5450.538
KLC[N,BS]0.8640.8560.8520.8530.8510.850

KL[ZR,BS]1.4060.6010.5450.5450.5320.531
KLC[ZR,BS]0.8600.8520.8480.8500.8480.848

*N, ZR and BS refer to Newbold, Zellner-Reynolds and Broemeling-Shaarawy approximate posteriors, respectively.

### Table 2

Results of KL divergence and its calibration (KLC) for MA(2)

(θ1, θ2)Sample size (n)

50100150200250300
(0.4, 0.3)KL[N,ZR]0.1880.0830.0530.0390.0310.025
KLC[N,ZR]0.7630.6880.6530.6320.6180.607

KL[N,BS]0.6180.2740.2360.2170.2110.205
KLC[N,BS]0.8330.7980.7840.7750.7720.769

KL[ZR,BS]0.7260.3010.2530.2290.2210.213
KLC[ZR,BS]0.8510.8080.7900.7800.7760.773

(0.3, 0.6)KL[N,ZR]0.4820.1160.0680.0490.0390.032
KLC[N,ZR]0.7890.7080.6690.6450.6310.619

KL[N,BS]2.1760.8050.6930.6410.6250.610
KLC[N,BS]0.9100.8950.8910.8860.8830.882

KL[ZR,BS]2.2300.8220.6980.6450.6280.611
KLC[ZR,BS]0.9180.8980.8930.8870.8840.883

### Table 3

Results of KL divergence and its calibration (KLC) for ARMA(1,1)

(φ, θ)Sample size (n)

50100150200250300
(0.5, 0.3)KL[N,ZR]0.1450.0730.0440.0330.0260.022
KLC[N,ZR]0.7380.6730.6400.6230.6100.600

KL[N,BS]0.6250.2040.1640.1430.1310.128
KLC[N,BS]0.8070.7610.7420.7320.7260.723

KL[ZR,BS]0.7950.2310.1800.1560.1420.137
KLC[ZR,BS]0.8250.7750.7520.7400.7320.729

(0.4, 0.6)KL[N,ZR]0.1840.0740.0460.0350.0270.022
KLC[N,ZR]0.7430.6760.6420.6240.6110.601

KL[N,BS]1.5260.7480.5760.5500.5440.540
KLC[N,BS]0.9060.8930.8850.8820.8810.882

KL[ZR,BS]1.7190.7300.5980.5610.5500.547
KLC[ZR,BS]0.9110.8940.8850.8820.8810.882

### Table 4

Results of KL divergence and its calibration (KLC) for ARMA(1,2)

(φ, θ1, θ2)Sample size (n)

50100150200250300
(0.4, 0.5, 0.4)KL[N,ZR]0.4400.1980.1070.0800.0620.048
KLC[N,ZR]0.8410.7550.7100.6830.6620.647

KL[N,BS]1.9910.8730.7220.6660.6490.628
KLC[N,BS]0.9370.9230.9170.9120.9100.909

KL[ZR,BS]2.2360.9300.7580.6950.6720.648
KLC[ZR,BS]0.9480.9270.9210.9150.9130.911

(0.7, 0.3, 0.4)KL[N,ZR]0.3590.1450.0910.0660.0500.041
KLC[N,ZR]0.8280.7420.6980.6720.6520.638

KL[N,BS]1.3340.6610.5650.5240.5020.484
KLC[N,BS]0.9270.9020.8940.8860.8830.879

KL[ZR,BS]1.4670.7010.5910.5450.5170.498
KLC[ZR,BS]0.9370.9090.8980.8900.8860.882

### Table 5

Results of KL divergence and its calibration (KLC) for ARMA(2,2)

(φ1, φ2, θ1, θ2)Sample size (n)

50100150200250300
(0.4, 0.3, 0.5, 0.4)KL[N,ZR]2.5370.3700.2390.2070.1360.109
KLC[N,ZR]0.9020.8170.7760.7490.7240.706

KL[N,BS]4.0381.5321.1480.9130.8640.825
KLC[N,BS]0.9570.9280.9250.9210.9200.919

KL[ZR,BS]6.6791.6961.2701.0300.9150.876
KLC[ZR,BS]0.9700.9430.9360.9340.9280.926

(1.3, 0.6, 0.3, 0.6)KL[N,ZR]0.5950.2570.1520.1110.0850.070
KLC[N,ZR]0.8870.7980.7470.7150.6920.676

KL[N,BS]4.3101.6911.4961.4251.3341.297
KLC[N,BS]0.9770.9750.9730.9720.9700.969

KL[ZR,BS]4.6521.7011.5271.4571.3591.320
KLC[ZR,BS]0.9810.9770.9750.9730.9710.969

### Table 6

Bayesian estimates of the coefficients of MA(1)

θnζjN(βy)ζjZR(βy)ζjBS(βy)

μ̂σ̂Q2.5%Q97.5%μ̂σ̂Q2.5%Q97.5%μ̂σ̂Q2.5%Q97.5%
0.4500.4130.0160.1080.7150.4130.0180.1080.7150.4100.0200.1070.708
1000.4060.0080.2100.6070.4060.0090.2100.6070.4020.0100.2010.583
1500.4050.0060.2420.5620.4050.0060.2420.5620.4030.0070.2360.567
2000.4030.0040.2660.5330.4030.0040.2660.5330.4020.0050.2540.540
2500.4020.0030.2830.5200.4020.0030.2830.5200.4010.0040.2720.530
3000.4030.0030.3010.5060.4030.0030.3010.5060.4020.0030.2910.517

0.7500.7230.0100.4831.0000.7230.0110.4831.0000.7150.0200.4051.007
1000.7100.0050.5460.8670.7100.0060.5460.8670.7040.0100.5030.883
1500.7070.0030.5760.8310.7070.0040.5760.8310.7040.0070.5360.862
2000.7050.0030.5960.8090.7050.0030.5960.8090.7030.0050.5520.840
2500.7040.0020.6140.7950.7040.0020.6140.7950.7020.0040.5740.828
3000.7040.0020.6280.7860.7040.0020.6280.7860.7020.0030.5910.814

### Table 7

Comparison criteria between estimates from different posteriors for MA(1).

θnMAPEσ^jZR/σ^jNσ^jBS/σ^jN

μ^jNμ^jZRμ^jBSμ̂Q2.5%Q97.5%μ̂Q2.5%Q97.5%
0.45058.33158.33160.6451.1320.6302.3481.3991.0132.153
10039.39139.39140.6141.0660.7081.7351.2231.0421.570
15031.48131.48133.1271.0400.7431.5231.2121.0611.465
20027.03327.03329.3721.0300.7721.4211.2051.0741.395
25023.99923.99926.2941.0250.7981.3491.2021.0871.368
30021.74721.74723.7911.0220.8091.3221.2001.0991.342

0.75028.38828.38834.8651.2760.5493.1993.5201.26119.866
10018.29918.29923.2461.1200.6512.1162.2341.4124.095
15014.49614.49618.9431.0660.6861.7522.1011.4853.187
20012.21712.21716.7571.0550.7231.6532.0421.5412.824
25010.53210.53214.9891.0470.7421.5912.0251.6012.702
3009.5419.54113.5261.0350.7541.4892.0141.6452.598

### Table 8

Bayesian estimates of the coefficients of MA(2)

nζjN(βy)ζjZR(βy)ζjBS(βy)

μ̂σ̂Q2.5%Q97.5%μ̂σ̂Q2.5%Q97.5%μ̂σ̂Q2.5%Q97.5%
θ1 = 0.3500.3100.0120.0490.5770.3100.0170.0490.5770.3100.0210.0020.622
1000.3030.0060.1390.4710.3030.0070.1390.4710.3010.0100.0990.486
1500.3030.0040.1630.4380.3030.0050.1630.4380.3020.0070.1390.459
2000.3010.0030.1840.4190.3010.0030.1840.4190.3020.0050.1540.438
2500.3010.0030.1930.4040.3010.0030.1930.4040.3010.0040.1730.427
3000.3010.0020.2070.3950.3010.0020.2070.3950.3020.0030.1890.417

θ2 = 0.6500.6410.0120.3301.0000.6410.0150.3301.0000.6210.0210.2980.956
1000.6110.0060.4190.8080.6110.0080.4190.8080.6030.0100.3870.823
1500.6060.0040.4710.7450.6060.0050.4710.7450.6010.0070.4290.772
2000.6030.0030.4810.7250.6030.0030.4810.7250.6000.0050.4590.744
2500.6020.0030.5000.7030.6020.0030.5000.7030.5990.0040.4800.721
3000.6010.0020.5140.6900.6010.0020.5140.6900.5980.0030.4910.708

### Table 9

Comparison criteria between estimates from different posteriors for MA(2)

nMAPEσ^jZR/σ^jNσ^jBS/σ^jN

μ^jNμ^jZRμ^jBSμ̂Q2.5%Q97.5%μ̂Q2.5%Q97.5%
θ1 = 0.35071.96271.96280.5001.7300.2843.2582.8201.08417.998
10045.56545.56553.8601.0940.7991.6621.7331.1802.824
15036.24936.24943.7151.0590.8281.4081.6141.2702.226
20032.12532.12538.9351.0430.8401.3281.5931.2942.106
25028.71828.71834.8311.0370.8571.3001.5841.3271.978
30025.83725.83731.5181.0300.8751.2551.5751.3491.890

θ2 = 0.65044.27644.27644.9431.2230.4894.4702.9201.09720.293
10025.73025.73029.3301.1790.7042.3011.7541.1982.874
15019.21819.21823.5051.0980.7331.7701.6261.2732.276
20016.03016.03019.6741.0660.7631.5661.6021.3002.116
25014.04514.04517.2801.0490.7831.4831.5911.3321.998
30012.43112.43115.6951.0390.7921.3951.5811.3511.904

### Table 10

Bayesian estimates of the coefficients of ARMA(1, 1)

nζjN(βy)ζjZR(βy)ζjBS(βy)

μ̂σ̂Q2.5%Q97.5%μ̂σ̂Q2.5%Q97.5%μ̂σ̂Q2.5%Q97.5%
φ = 0.3500.1920.1130.7330.6450.1920.0880.7330.6450.2490.1960.8580.957
1000.2740.0350.2050.6150.2740.0360.2050.6150.3110.0580.2260.762
1500.2910.0230.0710.5840.2910.0320.0710.5840.3160.0390.1160.711
2000.2940.0180.0080.5410.2940.0190.0080.5410.3170.0290.0610.672
2500.2970.0140.0330.5210.2970.0140.0330.5210.3140.0230.0050.622
3000.2990.0120.0590.5130.2990.0120.0590.5130.3170.0190.0460.601

θ = 0.7500.6280.0911.0000.5400.6280.0731.0000.5400.6740.2181.2090.676
1000.6980.0221.0000.1950.6980.0251.0000.1950.7290.0681.0620.207
1500.7050.0140.9240.3700.7050.0200.9240.3700.7250.0461.0110.338
2000.7030.0100.8870.4580.7030.0110.8870.4580.7230.0340.9820.417
2500.7050.0080.8670.4720.7050.0080.8670.4720.7200.0270.9440.463
3000.7040.0070.8640.5090.7040.0070.8640.5090.7200.0230.9310.480

### Table 11

Comparison criteria between estimates from different posterior s for ARMA(1, 1)

nMAPEσ^jZR/σ^jNσ^jBS/σ^jN

μ^jNμ^jZRμ^jBSμ̂Q2.5%Q97.5%μ̂Q2.5%Q97.5%
φ = 0.35084.46284.462100.9950.7120.3903.2962.4920.8898.979
10055.17455.17467.8190.9680.5952.9422.0810.9315.954
15042.14942.14952.4452.2650.6652.5921.9010.9614.208
20036.60336.60345.4291.0830.6782.0811.7771.0183.595
25031.57631.57639.4501.0370.7001.7741.7481.0503.272
30029.33829.33836.6621.0280.7241.7101.7241.0973.065

θ = 0.75034.12134.12141.0790.8090.3555.57510.0551.04872.894
10019.42919.42924.5560.9110.4454.6149.7831.20070.867
15014.69414.69419.3140.9780.5183.4857.2351.36921.411
20011.75511.75515.6671.1410.5582.4434.3851.61113.234
2509.9759.97513.4481.0560.5962.0864.1891.67010.889
3009.2749.27412.5411.0550.6141.9604.1221.77910.158

### Table 12

Results of KL divergence and its calibration for real-world time series datasets

DatasetKL[N, ZR]KLC[N, ZR]KL[N, BS]KLC[N, BS]KL[ZR, BS]KLC[ZR, BS]
Series A0.02480.61002.19500.99691.92650.9947
Series B0.02730.61530.11450.72620.10880.7211

### Table 13

Bayesian estimates of the coefficients for real-world time series datasets

DatasetζjN(βy)ζjZR(βy)ζjBS(βy)σ^jZR/σ^jNσ^jBS/σ^jN

μ̂σ̂μ̂σ̂μ̂σ̂
Series Aφ0.21550.00920.21550.00920.11420.01381.00861.5027
θ0.81930.00320.81930.00350.72670.01901.11365.9836

Series Bφ0.46130.17230.46130.19530.40820.23751.13381.3789
θ0.32460.19560.32460.22280.26660.24311.13901.2430

References
1. Amin AA (2009). Bayesian inference for seasonal ARMA models: A Gibbs sampling approach (Masters thesis) , Cairo University, Egypt.
2. Amin AA (2017a). Gibbs sampling for double seasonal ARMA models. Proceedings of the 29th Annual International Conference on Statistics and Modeling in Human and Social Sciences Cairo, Egypt. .
3. Amin AA (2017b). Bayesian inference for double seasonal moving average models: A Gibbs sampling approach. Pakistan Journal of Statistics and Operation Research, 13, 483-499.
4. Amin AA (2017c). Sensitivity to prior specification in Bayesian identification of autoregressive time series models. Pakistan Journal of Statistics and Operation Research, 14, 699-713.
5. Amin AA (2018). Bayesian inference for double SARMA models. Communications in Statistics-Theory and Methods, 47, 5333-5345.
6. Amin AA (2019a). Gibbs sampling for Bayesian prediction of SARMA processes. Pakistan Journal of Statistics and Operation Research, 15, 397-418.
7. Amin AA (2019b). Kullback-Leibler divergence to evaluate posterior sensitivity to different priors for autoregressive time series models. Communications in Statistics-Simulation and Computation, 48, 1277-1291.
8. Amin AA (2020). Bayesian analysis of double seasonal autoregressive models. Sankhya B, 82, 328- 352.
9. Amin AA (2022). Gibbs sampling for Bayesian estimation of triple seasonal autoregressive models. Communications in Statistics-Theory and Methods.
10. Amin AA and Ismail MA (2015). Gibbs sampling for double seasonal autoregressive models. Communications for Statistical Applications and Methods, 22, 557-573.
11. Ali SS (1998). Approximations for posterior densities of moving average models: A comparative study (Masters thesis) , Cairo University, Egypt.
12. Broemeling LD and Shaarawy S (1984). Bayesian inferences and forecasts with moving average processes. Communications in Statistics, 13, 1871-1888.
13. Broemeling LD and Shaarawy S (1988). Time series: A Bayesian analysis in the time domain, Spell J (Ed). Bayesian Analysis of Time Series and Dynamic Models, (pp. 1-21), New York, Marcel Dekker.
14. Box GEP, Jenkins GM, Reinsel GC, and Ljung GM (2015). Time Series Analysis: Forecasting and Control (5th ed), New York, John Wiley & Sons.
15. Ismail MA (1994). Bayesian forecasting for nonlinear time series (Ph.D. thesis) , University ofWales, UK.
16. Ismail MA and Amin AA (2014). Gibbs sampling For SARMA models. Pakistan Journal of Statistics, 30, 153-168.
17. Kullback S and Leibler RA (1951). On information and sufficiency. The Annals of Mathematical Statistics, 22, 79-86.
18. McCulloch RE (1989). Local model influence. Journal of the American Statistical Association, 84, 473-478.
19. Newbold P (1973). Bayesian estimation of box Jenkins transfer function noise models. Journal of the Royal Statistical Society: Series B, 35, 323-336.
20. Shaarawy SM and Ali SS (2012). Bayesian model order selection of vector moving average processes. Communications in Statistics - Theory and Methods, 41, 684-698.
21. Soliman E (1999). On Bayesian time series analysis (Ph.D. thesis) , Cairo University, Egypt.
22. Zellner A and Reynolds R (1978). Bayesian analysis of ARMA models. Proceedings of the Sixteenth Seminar on Bayesian Inference in Econometrics. .