TEXT SIZE

CrossRef (0)
Bayesian analysis of financial volatilities addressing long-memory, conditional heteroscedasticity and skewed error distribution

Rosy Oha, Dong Wan Shina, and Man-Suk Oh1,a

aDepartment of Statistics, Ewha Womans University, Korea
Correspondence to: 1Corresponding author: Department of Statistics, Ewha Womans University, 52 Ewhayeodae-gil, Seodaemun-gu, Seoul 03760, Korea. E-mail: msoh@ewha.ac.kr
Received July 6, 2017; Revised August 16, 2017; Accepted August 17, 2017.
Abstract

Volatility plays a crucial role in theory and applications of asset pricing, optimal portfolio allocation, and risk management. This paper proposes a combined model of autoregressive moving average (ARFIMA), generalized autoregressive conditional heteroscedasticity (GRACH), and skewed-t error distribution to accommodate important features of volatility data; long memory, heteroscedasticity, and asymmetric error distribution. A fully Bayesian approach is proposed to estimate the parameters of the model simultaneously, which yields parameter estimates satisfying necessary constraints in the model. The approach can be easily implemented using a free and user-friendly software JAGS to generate Markov chain Monte Carlo samples from the joint posterior distribution of the parameters. The method is illustrated by using a daily volatility index from Chicago Board Options Exchange (CBOE). JAGS codes for model specification is provided in the Appendix.

Keywords : ARFIMA, GARCH, skewed-t, JAGS, Bayesian, Markov chain Monte Carlo
1. Introduction

Volatility plays a crucial role in the theory and application of asset pricing, optimal portfolio allocation, and risk management. It is therefore important to introduce a model addressing the key features of volatility data sets and develop a valid estimation method. The most dominant features of the usual volatilities, the historic volatilities, the implied volatilities, and the realized volatilities are long memories, conditional heteroscedasticity and skewed error distribution.

The fractionally integrated autoregressive moving average (ARFIMA) model (Granger and Joyeux, 1980; Hosking, 1981) and the generalized autoregressive conditional heteroscedasticity (GARCH) model (Bollerslev, 1986) have been widely used to explain the long memory and the time-varying conditional variance features of volatilities, respectively. Park (2016) used a skewed student t distribution to incorporate asymmetry as well as fat tails of error distribution (Corsi et al., 2008; Grassi and Magistris, 2015; Graves et al., 2015; Maasoumi and McAleer, 2008; Wang et al., 2013).

However, none of the currently available methods propose a model incorporating all the above three features simultaneously. For example, Baillie et al. (1996) proposed the ARFIMA + GARCH model but did not incorporate the asymmetry of the error distribution. Park (2016) proposed a heterogeneous autoregressive (HAR) model to incorporate the long memory and then employed a GARCH + skewed-t distribution model to the estimated residuals obtained from fitting the HAR model to volatility data sets. Consequently, they incorporated the three features of volatilities in a stepwise fashion rather than simultaneously.

In this paper, we propose a model which combines ARFIMA, GARCH, and skewed-t distribution, namely ARFIMA + GARCH + skewed-t model, to simultaneously account for the long memory, time-varying variance, and asymmetric error distribution features of volatilities.

A Bayesian approach is used to estimate the parameters of the model. We employ Just Another Gibbs Sampler (JAGS, http://mcmc-jags.sourceforge.net/) to generate Markov chain Monte Carlo (MCMC) samples from the joint posterior distributions of the parameters. A key advantage of JAGS is that the software runs MCMC automatically once a user specifies a model (the distribution of data and the prior distributions of parameters); therefore, practitioners who are not experts in MCMC may also easily implement Bayesian inference using JAGS.

Given the posterior samples, several features of interest, such as the estimated marginal posterior densities, posterior moments, quantiles, scatter plots exhibiting interesting relations between parameters, may be derived in a straightforward manner. In addition, any restrictions on parameters, such as inequality restrictions on coefficients in the GARCH model to ensure stationarity, can be implemented in a straightforward way. Finally, useful prior information, which may be available from the data context or previous data analysis, can be used in Bayesian approaches (Czado and Min, 2011).

This paper is organized as follows. Section 2 presents the ARFIMA + GARCH + skewed-t model and the Bayesian estimation method. Section 3 presents analysis of a real data set. The final section draws conclusions.

2. Method

In this section, we describe the proposed model, the priors for the parameters, and a Markov chain Monte Carlo method for Bayesian inference.

2.1. Model

We consider a model that captures the features of conditional heteroscedasticity, long-memory, asymmetry, and fat tails via a combination of ARFIMA, GARCH, and skewed-t distribution. Long memories of volatility process yt is modeled by the ARFIMA(p, d, q) process

$Φ(B)(1-B)dyt=θ0+Θ(B)et,$

where et is a zero mean uncorrelated error sequence, d ∈ (−0.5, 1) is the fractional integration parameter, B is the backward-shift operator, and Φ(B) = 1−φ1B−· · ·−φpBp and Θ(B) = 1−θ1B−· · ·−θqBq are assumed to have roots outside the unit circle and have no common roots. The fractional differencing operator, (1−B)d, is defined as $(1-B)d=∑k=0∞πkBk$, where π0 = 1, and πk = πk−1(k − 1 − d)/k for k > 0. If d ∈ (−0.5, 0.5), yt is stationary; if d ∈ [0.5, 1), yt is nonstationary. For nonstationary series, Bayesian analysis can be made on the differenced series wt = (1− B)yt, which is always stationary for d ∈ (−0.5, 1]. The difference series wt is an ARFIMA(p, δ, q) given by Φ(B)(1− B)δ(wtμ) = Θ(B)et, where δ = d − 1 and μ = E(wt).

From Bollerslev et al., (1992), in general a GARCH(1, 1) model is sufficient to capture volatility clustering in the data. Therefore, in order to address the time-varying variance of yt, the error et in (2.1) can be assumed to follow a GARCH(1, 1) model given by

$et=σtɛt, σt2=α0+α1et-12+β1σt-12,$

where ɛt is a sequence of i.i.d. errors having zero mean and unit variance. Positivity of $σt2$ and stationarity of et are ensured via the standard restrictions: α0 > 0, α1 ≥ 0, β1 ≥ 0, and α1 + β1 < 1.

Asymmetry and fat tails are featured for the zero-mean-unit-variance error ɛt to have a standardized skewed-t distribution with skewness parameter θ and degrees of freedom ν (Lambert and Laurent, 2001). It has a probability density function (pdf)

$f(ɛt∣θ,ν)={2sθ+θ-1tν(θ(sɛt+m)),if ɛt<-ms,2sθ+θ-1tν(sɛt+mθ),if ɛt≥-ms,$

where m and s2 are the mean and the variance of the nonstandardized skewed-t distribution; in addition, tν(·) denotes the standard Student t density with ν degrees of freedom. The expected value and the variance of ɛt are given as E(ɛt) = 0, V(ɛt) = 1. The parameter θ indicates the direction and the degree of skewness. If θ = 1, the skewed-t distribution reduces to the Student t distribution.

Now we consider a model consisting of ARFIMA(1, d, 0), GARCH(1, 1), and skewed-t distribution, for which we have

$(1-φB)(1-B)δ(wt-μ)=et,et=σtɛt,σt2=α0+α1et-12+β1σt-12,ɛt∼iidStd.skt(θ,ν),$

with constraints α0 > 0, α1 ≥ 0, β1 ≥ 0, α1 + β1 < 1. We will refer this model as ARFIMA(1, d, 0) + GARCH(1, 1) + skewed-t model. The combined model takes into account all the previously mentioned important features of volatility data sets.

2.2. Priors

We assume prior independence among μ, φ, δ, (α0, α1, β1), θ, ν. All parameters except for μ should satisfy some constraints. We incorporate these constraints in prior distributions of the parameters so that posterior samples generated from MCMC as well as estimates of the parameters automatically satisfy the constraints.

A diffuse normal prior is used for μ, i.e., $μ~N(μμ,σμ2)$, with σμ large. We assume $φ~N(μφ,σφ2)I(-1,1)$ and $δ~N(μδ,σδ2)I(-1,0)$. For the prior of (α0, α1, β1), we assume π(α0, α1, β1) = π(α0, α1)π(β1)I(α0 > 0, α1 ≥ 0, β1 ≥ 0, α1 + β1 < 1), where I(·) is the indicator function. We choose π(β1) as $N(μβ1,σβ12)$. For π(α0, α1), we choose a hierarchical prior suggested by Ardia and Hoogerheide (2009):

$(α0,α1)~N((00),(σ02ρσ0σ1ρσ0σ1σ12)),$

$ρ~N(μρ,σρ2)I(-1<ρ<1)$, μρ ~ Unif(−1, 1), $σρ2~Gamma(1.5,10-4)$. Finally, we assume $θ~N(μθ,σθ2)I(θ>0)$, and $ν~N(μν,σν2)I(ν≥2)$.

2.3. Estimation

Clearly, the posterior distribution of the parameters Θ = (μ, δ, φ, α0, α1, β1, θ, ν) in the model (2.3) is not analytically tractable. Simple MCMC methods such as standard Gibbs sampling methods are not applicable to this model; therefore, Metropolis-Hastings (MH) algorithm is required to generate posterior samples of the parameters.

Implementation of the MH algorithm is often complicated for practitioners who are not experts in MCMC and/or coding. We get around this difficulty by generating MCMC samples by using user-friendly JAGS software. Given the model specification, it employs either Gibbs sampler when available or MH algorithm to generate posterior samples.

To use JAGS, the distribution of data needs to be specified as one of the distributions built in JAGS. However, the distributions in the ARFIMA(1, d, 0) + GARCH(1, 1) + skewed-t model are not commonly used distributions and are not provided by JAGS. Instead of specifying the data distribution, we can specify the likelihood function since JAGS derives the likelihood from the distribution of data and uses them for the Gibbs sampler or MH algorithm.

For the likelihood of Θ, we first derive the joint pdf of e = (e1, e2, …, eT ) from the GARCH(1, 1) and the skewed-t models, which is given as

$f(e1,e2,…,eT∣Θ)=f(e1∣Θ)f(e2∣e1,Θ)⋯f(eT∣e1,e2,…,eT-1,Θ)=fɛ(e1σ1)1σ1fɛ(e2σ2)1σ2⋯fɛ(eTσT)1σT,$

where fɛ(·) is the density function of standardized skewed-t distribution given in (2.2). Now we plug in the residuals in the ARFIMA model into (2.4). If wt is invertible, the residual et in the ARFIMA model can be represented as an infinite autoregressive (AR) process as;

$et=∑k=0∞πk((wt-k-μ)-φ(wt-k-1-μ))=∑k=0∞ψk(wt-k-μ),$

where ψk = πkπk−1φ, ψ0 = 1, πk = πk−1(k − 1 − d)/k and π0 = 1. The equation (2.5) is an infinite AR representation of ARFIMA(1, d − 1, 0). However, when T is sufficiently large and the first difference wt is stationary, we may replace (2.5) by a finite AR representation

$et=∑k=0t-1ψk(wt-k-μ).$

In usual financial volatility analysis, since data sets are recorded daily or more frequently and the series length is large, this approximation would be good. Now the likelihood function of Θ in the ARFIMA(1, d, 0) + GARCH(1, 1) + skewed-t model is given by (2.4) with et, t = 1, …, T, given in (2.6).

The likelihood function (2.4) can be specified in the model specification of JAGS, using the method suggested by Ntzoufras (2011) and Kruschke (2014). The Appendix contains the model specification used in JAGS.

3. Analysis of the volatility index data

3.1. Exploratory study

We consider a daily closing price for the volatility index (VIX), which is a measure of market expectations on volatility over the next 30 days conveyed by S&P 500 stock index option prices. The data set can be obtained from the Chicago Board Options Exchange (CBOE) web site, http://www.cboe.com/micro/vix/historical.aspx. We use data from January 3, 2006 to November 30, 2016, which contains 2,748 observations.

Time series plot of the VIX is presented in Figure 1. From the figure, we can see apparent conditional heteroscedastic feature of the VIX data; that high peaks from August 2007 to March 2009 due to the financial crisis of 2007–2008 cause large volatilities and that the relatively lower index values between 2013 and 2015 have smaller volatilities.

Table 1 shows basic summary statistics of the VIX. Figure 2 displays the histogram (with a normal approximation) and the autocorrelation function (ACF) plot of the VIX. The values of skewness, median, and mean (Table 1) and the histogram (Figure 2) show that the VIX is skewed to the right and has highly fat tails. The ACF plot of the VIX shows the feature of long memory which can be seen that the VIX is skewed to the right and has highly fat tails. The ACF plot of the VIX shows the feature of long memory which can be also supported by the GPH (Geweke and Porter-Hudak, 1983) test statistics of 8.663.

Table 2 presents the p-values of augmented Dicky-Fuller (ADF) test and Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test on the VIX (Dickey and Fuller, 1979; Kwiatkowski et al., 1992). Null hypotheses of both the ADF and the KPSS tests are rejected at 5% level. Recall that the ADF and the KPSS tests have reversed the null hypothesis of unit root stationarity, respectively. It presumably indicates that there exists significant evidence that the process is a neither nonstationary nor stationary process but near nonstationary. To meet stationarity property of the series, we use the first differences of the VIX for analysis. Note that the differenced series is always stationary for d ∈ (−0.5, 1].

3.2. Analysis

We applied the proposed Bayesian method to the first differences of the VIX data and generated posterior samples of parameters using JAGS. The first 3,000 iterations are discarded as burn-in and we obtained 15,000 samples after the burn-in from 3 different chains. Convergence of MCMC is diagnosed by Gelman-Rubin shrink factors and trace plots of the MCMC samples.

Table 3 displays the estimated posterior mean, standard error, and an approximate 95% credible interval of each parameter. All parameters are significant in that their 95% credible intervals do not contain zero. The estimated posterior mean −0.332 of the fractional integration parameter δ indicates that process of the VIX has a nonstationary memory of fractional integration order of 0.668. The sum α̂1 + β̂1 = 0.991 indicates strong conditional heteroscedasticity. We also see significant asymmetry of the error distribution from the highly significant estimates σ̂ = 1.376 of the skewness parameter.

The estimates satisfy the stationary condition α̂1 + β̂1 < 1. Figure 3 displays the scatter plot of the samples of (α1, β1) when the constraints are ignored, in which the dark points represent the sample points satisfying the condition and white points, 69.54% of samples, represent those not satisfying the condition. The plot indicates that enforcing the constraints in the priors plays an important role to obtain the estimates within the stationarity region.

Figure 4 shows the estimated marginal posterior distributions of the parameters (solid line) with normal approximations (dashed line). The figure shows that marginally the posterior distributions of the parameters seem to be well approximated by normal distributions, except for α0 and ν whose distributions show slight deviations from the normal approximations. Note the marginal posterior distributions are close to normal; however, the joint posterior distributions may be significantly different from multivariate normal (Figure 3).

For comparison, we applied a model ARFIMA + GARCH + skewed-t, which fits ARFIMA model to the data and then fit GARCH + skewed-t model to the estimated residuals from the ARFIMA model. The results are presented in

The most notable differences are the estimates of the parameters of the ARFIMA model. In the ARFIMA + GARCH + skewed-t model, δ is much smaller and φ is much larger than those in the stepwise ARFIMA + GARCH + skewed-t model. This reveals that the heteroscedasticity and/or skewed error has affected the parameters of the ARFIMA model and hence the importance of simultaneous consideration of the three important features of volatilities.

We also applied ARFIMA + GARCH + t and ARFIMA + GARCH + normal models to the data, assuming t and normal error distributions, respectively, instead of skewed-t. Table 5 provides the Deviance Information Criteria (DIC) (Spiegelhalter et al., 2002) for ARFIMA + GARCH + skewed-t, ARFIMA + GARCH + t, and ARFIMA + GARCH + normal models. Among the three models, ARFIMA + GARCH + skewed-t model is the best, having the smallest DIC, which reveals the asymmetry and heavy-tails of the error distribution. Table 4 provides estimation results from ARFIMA + GARCH + t and ARFIMA + GARCH + normal models. Compared with a skewed-t distribution for the error, assuming a t distribution yields a larger δ and a smaller degrees of freedom, and a normal distribution yields a much smaller φ and larger δ, α0 and α1.

4. Conclusion

In this paper, we have proposed a fully Bayesian implementation of the ARFIMA + GARCH + skewed-t model which simultaneously takes into account the features of long memory, conditional heteroscedasticity, asymmetry and fat tails of volatility data sets. JAGS are used to generate MCMC samples of the parameters from their joint posterior distributions since it can be run automatically given model specification without expert knowledge on MCMC and coding.

Bayesian method naturally produce estimates that satisfy the stationary constraints in GARCH (1, 1), by incorporating constraints in the prior. The scatter plot of the samples of (α1, β1) show that restricting the parameter space is necessary to obtain parameter estimates that satisfy the constraints.

In this Bayesian implementation, since the likelihood function of the parameters of the ARFIMA + GARCH + skewed-t model is specified, modifying the model specification and choosing flat priors yield approximate conditional sum of squares (CSS) estimates within the stationary region based on MCMC samples. One may also apply an optimization method directly to the given likelihood function to obtain CSS estimates, instead of using MCMC samples. However, it is often relatively difficult to incorporate parameter constraints in optimization than in MCMC.

Acknowledgements

This research was supported by the Basic Science Research Program of the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (2016R1A2 B4008914) (Rosy Oh and Man-Suk Oh) and (2016R1A2B4008780) (Dong Wan Shin).

Figures
Fig. 1. Time series plots of the volatility index.
Fig. 2. Histogram with normal density curve and ACF of the VIX. ACF = autocorrelation function; VIX = volatility index.
Fig. 3. Scatter plot for α1 and β1.
Fig. 4. Estimated posterior distribution (solid line) and normal distribution (dashed line).
TABLES

Table 1

Summary statistics of the volatility index data

MinMedianMeanMaxSDSkewnessKurtosis
9.89017.23020.06080.8609.5742.3927.546

Table 2

Test for unit root

p-value0.0422< 0.01

ADF = augmented Dicky-Fuller; KPSS = Kwiatkowski-Phillips-Schmidt-Shin.

Table 3

Estimation results for the first difference of volatility index from ARFIMA + GARCH + skewed-t model

Estimated posterior meanStandard error95 % credible interval

LowerUpper
μ−0.0080.002−0.013−0.004
φ0.1860.0360.1180.256
δ−0.3320.029−0.387−0.276
α00.0680.0140.0460.099
α10.2130.0230.1700.260
β10.7780.0220.7320.818
θ1.3760.0371.3041.450
ν4.4190.3173.8715.113

ARFIMA = autoregressive moving average; GARCH = generalized autoregressive conditional heteroscedasticity.

Table 4

Estimation results for the first difference of volatility index from various models

95 % credible interval

ARFIMA + GARCH + skewed-tARFIMA + GARCH + tARFIMA + GARCH + normal

Est.SELowerUpperEst.SELowerUpperEst.SELowerUpper
μ0.0010.010−0.0180.020−0.0210.004−0.029−0.015−0.0130.004−0.021−0.006
φ0.0280.030−0.0310.0880.1890.0440.1050.2780.1550.0460.0700.253
δ−0.1810.024−0.227−0.133−0.3050.037−0.379−0.236−0.2630.038−0.343−0.192
σ23.5040.0963.3203.697--------
α00.0950.0170.0650.1320.0970.0190.0620.1360.1430.0180.1100.180
α10.2670.0250.2230.3190.2490.0260.1930.2970.2530.0240.2080.304
β10.7270.0240.6750.7690.7430.0260.6960.7980.7180.0220.6750.760
θ1.2870.0301.2301.346--------
ν4.4710.3153.9155.1563.9210.2383.5054.435----

ARFIMA = autoregressive moving average; GARCH = generalized autoregressive conditional heteroscedasticity; Est. = Estimated posterior mean; SE = standard error.

Table 5

Deviance information criterion

skew-ttNormal
8886.2609033.8389508.023

References
1. Ardia, D, and Hoogerheide, LF (2009). Bayesian estimation of the GARTCH(1, 1) model with Student-t innovations. The R Journal. 2, 41-47.
2. Baillie, RT, Chung, CF, and Tieslau, MA (1996). Analysing inflation by the fractionally integrated ARFIMA-GARCH model. Journal of Applied Econometrics. 11, 23-40.
3. Bollerslev, T (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics. 31, 307-327.
4. Bollerslev, T, Chou, RY, and Kroner, KF (1992). ARCH modeling in finance: a review of the theory and empirical evidence. Journal of Econometrics. 52, 5-59.
5. Corsi, F, Mittnik, S, Pigorsch, C, and Pigorsch, U (2008). The volatility of realized volatility. Econometric Reviews. 27, 46-78.
6. Czado, C, and Min, A (2011). Bayesian inference for D-vines: estimation and model selection. Dependence Modeling: Vine Copula Handbook, Kurowicka, D, and Joe, H, ed. Singapore: World Scientific Publishing Co, pp. 249-264
7. Dickey, DA, and Fuller, WA (1979). Distribution of the estimators for autoregressive time series with a unit root. Journal of the American Statistical Association. 74, 427-431.
8. Geweke, J, and Porter-Hudak, S (1983). The estimation and application of long memory time series models. Journal of Time Series Analysis. 4, 221-238.
9. Granger, CW, and Joyeux, R (1980). An introduction to long-memory time series models and fractional differencing. Journal of Time Series Analysis. 1, 15-29.
10. Grassi, S, and de Magistris, PS (2015). It’s all about volatility of volatility: evidence from a two-factor stochastic volatility model. Journal of Empirical Finance. 30, 62-78.
11. Graves, T, Gramacy, RB, Franzke, CLE, and Watkins, NW (2015). Efficient Bayesian inference for ARFIMA processes. Nonlinear Processes in Geophysics Discussions. 2, 573-618.
12. Hosking, JR (1981). Fractional differencing. Biometrika. 68, 165-176.
13. Kruschke, JK (2014). Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan. London: Academic Press
14. Kwiatkowski, D, Phillips, PCB, Schmidt, P, and Shin, Y (1992). Testing the null hypothesis of stationarity against the alternative of a unit root: how sure are we that economic time series have a unit root?. Journal of Econometrics. 54, 159-178.
15. Lambert, P, and Laurent, S (2001). Modelling financial time series using GARCH-type models with a skewed Student distribution for the innovations.Discussion Paper, Retrieved September 1, 2017, from https://dial.uclouvain.be/pr/boreal/en/object/boreal%3A91014/datastream/PDF_01/view
16. Maasoumi, E, and McAleer, M (2008). Realized volatility and long memory: an overview. Econometric Reviews. 27, 1-9.
17. Ntzoufras, I (2011). Bayesian Modeling Using WinBUGS. New York: John Wiley & Sons
18. Park, SK 2016. . Value at risk forecasting for volatility index (Master’s thesis). Ewha Womans University. Seoul.
19. Spiegelhalter, DJ, Best, NG, Carlin, BP, and Van Der Linde, A (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 64, 583-639.
20. Wang, R, Kirby, C, and Clark, S 2013. Volatility of volatility, expected stock return and variance risk premium., 26th Australasian Finance and Banking Conference 2013.