TEXT SIZE

CrossRef (0)
Estimation of structural vector autoregressive models

Helmut Lütkepohl

aDIW Berlin and Department of Economics, Freie Universität Berlin, Germany
Correspondence to: 1DIW Berlin, Mohrenstr. 58, 10117 Berlin, Germany. E-mail: hluetkepohl@diw.de
Received August 1, 2017; Revised August 28, 2017; Accepted August 29, 2017.
Abstract

In this survey, estimation methods for structural vector autoregressive models are presented in a systematic way. Both frequentist and Bayesian methods are considered. Depending on the model setup and type of restrictions, least squares estimation, instrumental variables estimation, method-of-moments estimation and generalized method-of-moments are considered. The methods are presented in a unified framework that enables a practitioner to find the most suitable estimation method for a given model setup and set of restrictions. It is emphasized that specifying the identifying restrictions such that they are linear restrictions on the structural parameters is helpful. Examples are provided to illustrate alternative model setups, types of restrictions and the most suitable corresponding estimation methods.

Keywords : Bayesian estimation, maximum likelihood estimation, generalized method-of-moments, method-of-moments, instrumental variables, structural vector autoregression
1. Introduction

In a seminal paper Sims (1980) criticized traditional simultaneous equations systems and proposed using vector autoregressive (VAR) models as alternatives. Since then structural VAR models have become a standard tool for macroeconomic analysis. Structural VAR models are estimated with a variety of methods that depend on the model setup and the type of structural (identifying) restrictions. The estimation techniques that have been used in this context are least-squares (LS), method-of-moments (MM), instrumental variables (IV), generalized method-of-moments (GMM), maximum likelihood (ML) and Bayesian methods. All these methods are described in the related literature (Breitung et al., 2004; Kilian and Lütkepohl, 2017; Lütkepohl, 2005). In this survey a systematic account of the available methods is provided that directs researchers to the appropriate method for a given model setup and set of identifying restrictions. This paper serves researchers who look for a suitable estimation method for a specific application of structural VAR models.

The study focusses explicitly on models where the identifying restrictions are placed on the impact effects of the shocks or on the structural relations of the variables. It sets up the model as a system of simultaneous equations and directs the reader to the most suitable estimation methods for a given set of restrictions. The presentation of the estimation methods partly follows Kilian and Lütkepohl (2017). The methods are not only presented but also practical advice is given which one to use in a specific situation. The emphasis is on exposing the methods rather than specific estimation algorithms because suitable software for most of the methods is available in the internet. (For example, Matlab code for many of the methods mentioned in this review is available at http://www-personal.umich.edu/~lkilian/book.html. JMulTi is a menu-driven software that includes estimation methods for a range of structural VAR models (Krätzig, 2004). It can be downloaded from http://www.jmulti.de. Moreover, EViews is a commercial software with a structural VAR estimation part (EViews, 2000, http://www.eviews.com/home.html).) Also statistical inference derived from the estimates is not the focus of this survey because inference methods in structural VAR analysis are more important for functions of the parameters such as impulse responses than for the structural parameters. For impulse responses and related quantities frequentist inference is typically based on bootstrap methods that require the computation of a large number of point estimates. Hence, being able to obtain such estimates is important. This survey can help finding suitable estimation techniques that can be used as basis for bootstrap methods.

The study is structured as follows. In Section 2 the model setup is presented and Section 3 discusses the types of restrictions considered. Sections 4–7 describe a range of estimation methods and illustrations are presented in Section 8. Section 9 concludes and mentions some extensions.

The following general notation is used throughout. The natural logarithm is abbreviated as log and Δ is the differencing operator defined such that Δyt = ytyt1 for a time series variable yt. The trace and the determinant of a square matrix A are denoted as tr(A) and det(A), respectively, whereas chol(A) denotes the Cholesky decomposition of A. In other words, chol(A) is a lower-triangular square matrix with nonnegative diagonal elements such that A = chol(A)chol(A)′. The column stacking operator is signified by vec and the operator which stacks all columns of a square matrix from the main diagonal downward is signified by vech. For a vector x, diag(x) stands for the diagonal matrix with the components of x on its main diagonal. IK denotes a K × K identity matrix. For n > m, the orthogonal complement of a n × m matrix A is denoted by A. In other words, A is a n × (nm) matrix such that the n × n matrix [A, A] is nonsingular and AA = 0. A normal distribution with mean μ and variance (covariance matrix) ∑ is signified as (μ, ∑).

2. Model setup

The basic model of interest in this survey is a VAR model of order p, i.e., a VAR(p) model. It is set up in levels VAR form or, if some of the variables are integrated and possibly cointegrated, in vector error correction form. In this section both setups are considered in turn.

### 2.1. Vector autoregressive models in levels

We consider the K-dimensional reduced-form VAR(p) model

$yt=ν+A1yt-1+⋯+Apyt-p+ut=AYt-1+ut,$

where $Yt-1′≡(1,yt-1′,…,yt-p′)$ is (Kp + 1)-dimensional, A ≡ [ν, A1, …, Ap] is K × (Kp + 1), and ut = (u1t, …, uKt)′ ~ (0, ∑u) is a K-dimensional white noise residual process, i.e., ut is serially uncorrelated, has zero mean and covariance matrix ∑u. The K ×1 vector ν is a fixed, nonstochastic intercept term. Additional deterministic terms can be handled straightforwardly but are neglected because they are not important for the following discussion.

The VAR(p) process yt is stable and stationary if the polynomial

$det(IK-A1z-⋯-Apzp)$

has all its roots outside the complex unit circle. If the polynomial (2.2) has unit roots (z = 1), some or all of the components of yt are integrated and possibly cointegrated. It is assumed that all variables are at most integrated of order one such that they are stationary in first differences.

To represent structural relations between the variables, the equations (2.1) are multiplied by an invertible K × K matrix A such that

$Ayt=Aν+AA1yt-1+⋯+AApyt-p+Aut=AAYt-1+Aut.$

In this structural-form VAR representation, the structural innovations or shocks are wt = (w1t, …,wKt)′ = Aut. It is assumed that A has a unit main diagonal and wt ~ (0, ∑w) has a diagonal covariance matrix $Σw=AΣuA′=diag(σ12,…,σK2)$. Thereby the kth structural equation can be written with ykt as dependent variable on the left-hand side and instantaneous values of other variables and lags of all variables on the right-hand side. In other words, denoting the i jth element of A by aij, the kth structural equation is

$ykt=-∑i=1i≠kKakiyit+ak*Yt-1+wkt,$

where $ak*$ is the kth row of AA. Restrictions have to be imposed on A to ensure that the structural matrix A is identified and can be estimated consistently. Using the terminology of Lütkepohl (2005) this model is called the A-model.

Alternatively, the structure can be imposed by defining wt = B1ut or ut = Bwt and considering the structural form

$yt=ν+A1yt-1+⋯+Apyt-p+Bwt=AYt-1+Bwt.$

In other words, the reduced-form innovations ut are replaced by Bwt. In this setting, wt is assumed to have unit covariance matrix such that wt ~ (0, IK) and, hence, ∑u = BB′. This model is termed B-model. In this model the identifying restrictions are imposed on the structural matrix B which represents the impact or short-run effects of the structural shocks wt.

Occasionally a combination of A- and B-models is of interest to ease the imposition of the identifying restrictions. It is set up as

$Ayt=Aν+AA1yt-1+⋯+AApyt-p+ABwt=AAYt-1+ABwt$

and is referred to as AB-model (Amisano and Giannini, 1997). In this model, again wt ~ (0, IK) and hence, ∑u = A1BBA1′. Which model type is considered in a specific situation depends on the nature of the identifying assumptions.

For the purposes of estimation it is useful to note that the exposition of the estimation methods for the structural parameters is sometimes simplified by concentrating out the reduced-form slope parameters A by replacing them with their equation-wise LS estimates, that is, A is replaced by

$A^=∑t=1TytYt′(∑t=1TYt-1Yt-1′)-1,$

if there are no restrictions on A. In other words, defining ût = ytÂYt1, we can focus on structural equations Aût = t, ût = Bt or At = Bt for estimating the structural parameters A or/and B.

### 2.2. The vector error correction model

Notice that the standard levels form of the VAR(p) model (2.1) can also be used if there are integrated and cointegrated variables. However, for cointegrated VAR models it is often useful to consider a model setup which involves the cointegrating relations explicitly. If there are r linearly independent cointegrating relations, the VAR(p) model can be set up in vector error correction model (VECM) reduced-form as

$Δyt=ν+αβ′yt-1+Γ1Δyt-1+⋯+Γp-1Δyt-p+1+ut,$

where α and β are K × r matrices of rank r. If there are no over-identifying restrictions on the VECM parameters, the Johansen reduced rank regression/Gaussian ML approach can be used to estimate the parameters of this reduced form (Johansen, 1995; Lütkepohl, 2005, Chapter 7).

Denoting the corresponding estimated residuals by t, the structural parameters can again be estimated from At = t in the A-model, t = Bt in the B-model, and At = Bt in the AB-model. Thus, if there are no restrictions on the reduced-form slope parameters A or the VECM parameters, the structural parameters can often be estimated using the relevant candidate of the three foregoing systems of equations. Although estimating the structural parameters in this way opens up useful algorithms for computing point estimates, it may not be equally useful for evaluating asymptotic variances or covariance matrices. As mentioned earlier, in structural VAR analysis having simple computational methods for getting point estimates is important because the structural parameters are often only of interest for computing impulse responses and other quantities for inference.

The preferred estimation method depends not only on the model setup but also on the identifying restrictions. In the next section alternative sets of restrictions are presented which are important in this context. The corresponding estimation methods are discussed in Sections 4–7.

3. Types of restrictions

It is recommended to use a model setup that facilitates imposing the identifying assumptions in the form of linear restrictions. These are the most popular restrictions in practice. Important examples are exclusion restrictions on the impact effects of the structural shocks or on their long-run effects.

### 3.1. Restrictions on the impact effects of shocks

Recursive models are perhaps the most common structural VAR models identified by short-run restrictions on the impact effects of the structural shocks. They amount to restricting A or B to be upper or lower triangular. Recursive models are typically just-identified although there can also be additional over-identifying restrictions. Whether they are set up as A or B models does not matter for just-identified recursive models because the inverse of a triangular matrix is also triangular. If a triangular matrix A is estimated, an estimate for B can be constructed from the inverse of the estimate of A and vice versa.

A recursive setup is also often used in partially identified models where, for example, only one economic relation is identified or only one structural shock is of interest. In that case only one row of A or one column of B is identified. Since just-identifying restrictions on other equations do not affect the identified equation, it is often possible to impose recursive restrictions on the model and thereby fully identify A or B from a statistical point of view, meaning that the parameters are not necessarily meaningful for economic interpretation but can be estimated consistently. For example, in a three-variable system consisting of the log of gross national product (gnpt), an inflation rate (πt) and a short-term interest rate (rt), one may draw on a Taylor rule relation to postulate that gnpt and πt may have an instantaneous effect on rt, while interest rate changes have only a delayed effect on the former two variables. Ignoring lags, a structural system of the form

$[1a120a2110a31a321] (gnptπtrt)=wt$

can be set up in this case. The first two equations in this system are clearly not identified but can be identified by setting a12 = 0 which results in a recursive model. Such a restriction is arbitrary if no economic reasoning suggests it. Still, if the first two equations are not of economic interest, the restriction can be used to ensure statistical identification of the model. Given that recursive models are particularly easy to estimate, this type of identification restrictions is quite common in practice (e.g., Christiano et al., 1999; Kilian and Lütkepohl, 2017, Chapter 8).

For non-recursive models or over-identified structural VAR models it is important whether linear restrictions are formulated for A or B. Clearly, linear restrictions for A may translate into nonlinear restrictions for its inverse and linear restrictions for B may correspond to nonlinear restrictions for A. Working with linear restrictions is recommended from a computational point of view. Therefore it is useful to choose a model setup that accommodates the restrictions in linear form.

Generally linear restrictions on A can be written as

$vec(A)=Rγ+r,$

where R is a known K2 × J matrix, γ is the J ×1 vector of unrestricted elements of A and r is a known K2 ×1 vector. In the example (3.1) with a12 = 0, the elements on the main diagonal of A are restricted to be one such that

$r=(1,0,0,0,1,0,0,0,1)′,γ=(a21,a31,a32)′$

and

$R=[010000000001000000000001000]′.$

Alternatively, there may be linear restrictions on B that can be written as

$vec(B)=Rγ+r,$

where r = 0, if there are just exclusion restrictions. Sometimes it is preferable to write the restrictions in the form

$Q vec(B)=q,$

where Q is a known J×K2 restriction matrix and q is a known J×1 vector. Likewise linear restrictions for A could be represented in this form. Whether the restrictions are written as in (3.2) or in (3.3) is often a matter of convenience and depends to some extent on the estimation method used. Of course, there may be analogous restrictions on both A and B in the AB-model.

### 3.2. Restrictions on the long-run effects of shocks

Structural identification in VAR models is often achieved by imposing restrictions on the long-run effects of the shocks. Since the long-run effects of the shocks in a stable, stationary VAR model are zero, the long-run effects considered in the context of stable models are the accumulated effects of the shocks. Defining A(1) ≡ IKA1 − · · · − Ap, the relevant long-run effects or long-run multiplier matrix is

$Ξ=A(1)-1A-1B$

for the AB-model and analogously for the A-model and the B-model. It is more common in this context to use either the A-model or the B-model. In particular, the B-model with long-run multiplier matrix Ξ = A(1)1B is popular in practice (see Kilian and Lütkepohl, 2017, Chapter 10, for examples and references).

Typical restrictions in this setup are exclusion restrictions. The most common set of restrictions constrains Ξ to be a triangular matrix (e.g., Blanchard and Quah, 1989). More generally, for the B-model, linear restrictions on Ξ can be written as

$Qlvec(Ξ)=Ql(A(1)-1′⊗IK)vec(B)=ql,$

where Ql is a suitable known restriction matrix and ql is a suitable vector. Thus, given the reduced-form parameters, the implied restrictions for the structural parameters B are linear restrictions with restriction matrix

$Q=Ql(A(1)-1′⊗IK).$

It is easy to combine these restrictions with further linear restrictions on the impact effects.

In a model with integrated and cointegrated variables, a shock can have permanent effects on the variables. Hence, considering such effects for identification purposes makes sense. The long-run effects matrix for the B-model in this case is

$Ξ=β⊥[α⊥′(IK-∑i=1p-1Γi)β⊥]-1α⊥′B,$

where β and α are orthogonal complements of β and α, respectively (e.g., King et al., 1991; Lütkepohl, 2005, Chapter 9). For the A- and AB-model, B has to be replaced by A1 or by A1B, respectively. It is noteworthy that the long-run effects matrix has rank Kr and is hence a reduced rank matrix if there are r > 0 linearly independent cointegration relations. The kth column of Ξ represents the long-run effects of the kth structural shock, wkt, on the components of yt.

Again linear restrictions can be imposed easily on Ξ. Since α, β, and the Γi are determined by the reduced form, such restrictions amount to linear restrictions on B if the B-model is used. Typical restrictions on Ξ are exclusion restrictions which amount to imposing that certain shocks do not have a long-run or permanent effect on some of the variables. If a whole column of Ξ is restricted to zero this means that the corresponding shock does not have any permanent effects on any of the variables and is hence transitory. A shock is called persistent if it has permanent effects on at least one of the variables.

Typically such long-run restrictions are complemented with linear restrictions directly on the impact effects. Such restrictions are in fact necessary for identifying all shocks properly if there is more than one purely transitory shock in which case such shocks can obviously not be distinguished by their long-run effects which are all zero.

4. Estimation of the A-model

All the frequentist estimators considered in the following are consistent and have asymptotic normal distributions under standard conditions. Many of them are even equally efficient asymptotically. However, as mentioned in the introduction, the structural parameters as such are often not of main interest but derived quantities such as the implied impulse responses and forecast error variance decompositions are of interest. Inference for these quantities is typically based on bootstrap methods for which a large number of estimates has to be computed. Thus, it is essential that a computationally efficient and robust estimation method is used for the structural parameters. This survey is meant to direct the reader to suitable methods for a given setup. Asymptotic properties of the estimators and conditions for their validity can be found, for example, in Kilian and Lütkepohl (2017, Chapters 9 and 11).

### 4.1. Short-run restrictions

4.1.1. Just-identified models

If the model is just-identified and there are just exclusion restrictions, estimation of the A-model is particularly simple. A popular example is a recursive model where the structural matrix A is lower triangular. In that case, A and ∑w can be estimated by MM using the relation ∑u = A1wA1′. The MM estimator can be computed easily with the help of a Cholesky decomposition of the reduced-form error covariance matrix ∑u. Typically ∑u is estimated as

$Σ˜u=T-1∑t=1Tu^tu^t′ or Σ^u=(T-Kp-1)-1∑t=1Tu^tu^t′.$

The latter estimator uses a degrees-of-freedom adjustment and may be preferable in small samples. The estimator Σ̃u is the Gaussian ML estimator and is used repeatedly in the following. It should be understood that the asymptotic properties of the MM estimators remain unaffected if Σ̃u is replaced by Σ̂u.

The MM estimator of A is

$A^=diag(d^1,…,d^K)-1chol(Σ˜u)-1,$

where the k, k = 1, …, K, are the diagonal elements of chol(Σ̃u)1. The MM estimator for the matrix ∑w is

$Σ˜w=A^Σ˜uA^′.$

Generally, if the A-model is just-identified and there are just exclusion restrictions for the off-diagonal elements of A, simple estimation methods are available even if A is not triangular. The kth equation of the A-model can be written as

$u^kt=u^kt′at+w˜kt,$

where kt is the column vector of the elements of t that appear on the right-hand side of the kth equation of the model At = t and where ak is the vector of associated structural parameters. For example, for a lower-triangular matrix A, no parameters in the first equation have to be estimated and the ‘regressors’ for k = 2, …, K are of the form $u^kt′=(u^1t,…,u^k-1,t)$ with parameters ak = (−ak1, …, −ak,k1)′.

Summarizing the model for t = 1, …, T in matrix notation yields

$xk=Xkak+wk,$

where xk ≡ (k1, …, kT)′, Xk ≡ [k1, …, kT ]′ and wk ≡ (k1, …, kT)′. The LS estimator is

$a^kLS=(Xk′Xk)-1Xk′xk.$

This estimator is equivalent to the MM estimator based on solving the system of equations

$Σ˜u=A-1diag(σ12,…,σK2)A-1′$

for the unrestricted elements of A and $σ12,…,σK2$. It can also be interpreted as a GMM estimator based on the moment conditions $E(ukt′wkt)=0$. Thus,

$a^kLS=a^kMM=a^kGMM.$
4.1.2. Over-identified models

LS estimation can also be used if there are over-identifying restrictions of the form

$ak=Rkγk,$

where Rk is a matrix that relates the unrestricted parameters γk to the restricted parameter vector ak. The vector γk can be estimated by LS from the regression

$xk=Xk†γk+wk,$

where $Xk†=XkRk$. Now there may be more instruments in Xk than parameters in the kth equation. These instruments can be used to estimate the parameters γk possibly more efficiently by two-stage LS (2SLS) using the instruments $X^k†≡Xk(Xk′Xk)-1Xk′Xk†$,

$γ^kIV=γ^k2SLS=(X^k†′Xk†)-1X^k†′xk.$

This estimator can also be interpreted as a GMM estimator based on the moment conditions $E(ukt′wkt)=0$ and the GMM objective function

$J(γk)=(1Twk′Xk)(1TXk′Xk)-1(1TXk′wk)=(1T(xk-Xk†γk)′Xk)(1TXk′Xk)-1(1TXk′(xk-Xk†γk)).$

In other words, if this GMM objective function is used for estimation,

$γ^kGMM=γ^k2SLS.$

So far only the estimation of the kth equation of the structural VAR model has been discussed. Since the components of wt are instantaneously uncorrelated, single-equation GMM is identical to estimating the full system

$(x1⋮xK)=[X1†⋯0⋮⋱⋮0⋯XK†](γ1⋮γK)+(w1⋮wK)$

by GMM. Note that there may be equations without any parameters to estimate like in the recursive model. Such equations are dropped from the system. The full system has to be estimated jointly, if there are cross-equations restrictions.

It is worth emphasizing again that concentrating out the parameters of the lagged variables and working with the estimated reduced-form residuals instead of the observations is not the same as the standard GMM framework for estimating the structural model including the lagged values. If standard GMM software is used, it is preferable to work with the original observations and include the lagged values in the equations. The same formulas can be used in that case but the symbols have to be adjusted properly. For example, xk has to include the original observations, i.e., xk = (yk1, …, ykT)′ instead of LS residuals, Xk and $Xk†$ are matrices that include the lagged observations in addition to unlagged observations, and the parameter vector γk has to include also all parameters associated with the lags of the observed variables etc. If such modifications are used, the standard output of GMM estimation for asymptotic covariance matrices, t-ratios etc. can be used.

Estimates of the elements of A are obtained from the relation ak = Rkγk by replacing γk with an estimator. The diagonal elements of ∑w can be estimated in the usual way from the estimated structural residuals as

$σ^k2=1T∑t=1Tw^kt2.$

### 4.2. Other restrictions and estimation approaches

4.2.1. Just-identifying restrictions

Suppose there are just-identifying restrictions of the form

$Qvec(A)=q,$

where this set of equations may contain short-run and long-run restrictions. In other words, Q may contain reduced-form parameters as in (3.5). In that case, we denote by the matrix obtained by replacing all reduced-form parameters by LS estimators for the levels VAR model or by Gaussian ML estimators for VECMs.

For this type of restrictions, MM estimation is the preferred method for just-identified models. Since

$AΣuA′=diag(σ12,…,σK2),$

u is replaced by Σ̃u and the system of equations

$[vech(AΣ˜uA′)Q˜vec(A)]=[vech(diag(σ12,…,σK2))q]$

is solved by a nonlinear equations solver. It can be shown that the resulting estimates and $Σ˜w=diag(σ˜12,…,σ˜K2)$ are identical to the Gaussian ML estimates.

4.2.2. Gaussian maximum likelihood estimation

If there are over-identifying restrictions, Gaussian ML estimates of the structural parameters are obtained by maximizing the concentrated log-likelihood as a function of the structural parameters,

$log lc(A,Σw)=constant+T2log(det A)2-T2log det(Σw)-T2tr(A′Σw-1AΣ˜u),$

where $Σ˜u=T-1∑t=1Tu^tu^t′$ is the usual ML estimator of the reduced-form residual covariance matrix (Lütkepohl, 2005, Chapter 9). The log-likelihood function has to be maximized with respect to A and $Σw=diag(σ12,…,σK2)$ subject to the identifying restrictions which may require numerical optimization methods. If the restrictions are just-identifying, the estimator is the same as the MM estimator discussed earlier.

4.2.3. Instrumental variable estimation of structural vector error correction models

Pagan and Pesaran (2008) observe that in a structural VECM the cointegration relations can serve as instruments under certain conditions. They show that a VECM with r (0 < r < K) linearly independent cointegration relations, r structural shocks with transitory effects only and Kr persistent shocks with permanent effects, can be arranged such that

$ApΔyt=Γ1†pΔyt-1+⋯+Γp-1†pΔyt-p+1+wtp,$$AtrΔyt=α(r×r)†β′yt-1+Γ1†trΔyt-1+⋯+Γp-1†trΔyt-p+1wttr,$

where the intercept has been deleted for simplicity, $wtp$ is a (Kr)-dimensional vector of persistent shocks and $wttr$ is an r-dimensional vector of transitory shocks. In this model setup, the cointegration relations, βyt1, do not appear in the first Kr structural equations of the system. If the r cointegration relations βyt1 are known, they can be used as instruments in the first set of equations and $wtp$ can be used as a vector of instruments for Atr in the second set of equations because $wtp$ is uncorrelated with $wttr$.

Since Ap has up to (K − 1)(Kr) unknown elements to be estimated, the r cointegration relations may not provide enough instruments for consistent estimation of all the structural parameters. In that case the instruments have to be complemented with other identifying information such as exclusion restrictions for the impact effects of the structural shocks.

Kilian and Lütkepohl (2017, Section 11.2) emphasize that this IV method for estimating the structural parameters in the presence of long-run restrictions requires that (1) the number of transitory shocks is equal to the cointegrating rank, r, of the system and the number of persistent shocks is equal to the number of common trends, Kr; (2) the cointegration relations are known. In any case, if the instruments are insufficient for estimating all structural parameters, identifying information from other sources is needed.

5. Estimation of the B-model

### 5.1. Just-identified models

If the B-model is just-identified, a MM approach to estimating B is recommended. That approach is based on the covariance matrix of the reduced-form VAR residuals, ∑u, which may be expressed in terms of the structural model parameters as

$Σu=BB′.$

Replacing ∑u by a consistent estimate Σ̃u or Σ̂u and solving the system of equations (5.1) subject to the identifying restrictions gives an estimate of B. For the A-model, MM estimation is typically considered only if the model is recursive or if numerical methods are needed for estimation and no simple closed form estimator exists. In contrast, for the B-model, in some cases computing the MM estimator is straightforward. In the following the focus is on such cases but the general case which involves nonlinear numerical techniques is also presented.

5.1.1. Recursive models

If B is lower-triangular the easiest way of computing the MM estimator of the structural parameters involves estimating the reduced-form VAR parameters, computing the residual variance-covariance matrix Σ̂u or Σ̃u and applying the Cholesky decomposition. In other words,

$B^=chol(Σ^u) or B^=chol(Σ˜u).$

If B is upper triangular, the same approach can be used, but it is necessary to multiply the estimate of B from the left and the right by the orthogonal matrix

$G=[0⋯010⋯10⋮⋱⋮⋮1⋯00]$

such that, for example,

$B^=G chol(Σ˜u)G$

(see, Lütkepohl, 2005, A.9.3). This MM estimator is also the Gaussian ML estimator if ∑u is estimated by Σ̃u.

Estimation by Cholesky decomposition is also possible if there are recursive long-run restrictions, i.e., if Ξ = A(1)1B is triangular. This case is quite common in practice (e.g., Blanchard and Quah, 1989). For simplicity we assume that Ξ is lower triangular. In that case, chol( (1)1 Σ̃u (1)1′) is an estimator of A(1)1B and

$B^=A^(1)chol(A^(1)-1Σ˜uA^(1)-1′)$

is the MM estimator of the structural matrix B.

It is important to note that this MM estimator involves the inverse of (1) = IK1 − · · · − p which may be ill-conditioned if some of the components of yt are integrated and possibly cointegrated because in that case A(1) is singular. Thus, the MM estimator should only be used if the underlying VAR process is stable. In fact, even in that case the estimates may not be accurate if there are very persistent variables with autoregressive roots near unity because in that case A(1) will be near singular which may translate into large variances of the MM estimator. This problem has led some researchers to question the usefulness of long-run identifying restrictions (see Kilian and Lütkepohl, 2017, Section 11.4, for a discussion). Lütkepohl et al. (2017) show, however, that long-run restrictions can lead to more precise inference for impulse responses derived from the structural parameters than short-run restrictions, provided the autoregressive roots are not close to the unit circle.

5.1.2. Non-recursive models

If the restrictions are just-identifying but not recursive so that B is not triangular, a nonlinear equations solver can be used to solve the system of equations

$Σ^u=BB′ or Σ˜u=BB′$

for B, subject to the identifying restrictions. For small systems with exclusion restrictions it may be possible to obtain a recursive system by rearranging the variables. Moreover, sometimes the solution is easy to find without a nonlinear equations solver. In that case, such solutions may be preferable in terms of computing time. Rubio-Ramírez et al. (2010) present a non-iterative algorithm for just-identified models which can be used in this context and which may have computational advantages.

### 5.2. Over-identified models

5.2.1. Generalized method-of-moments estimation

If over-identifying restrictions are available for B, GMM estimation can be used. In that case, no B matrix exists that satisfies all identifying restrictions and also the relations Σ̃u = BB′. Therefore the objective is to find the structural parameter values that minimize a weighted average of the moment conditions where the asymptotically optimal set of weights corresponds to the inverse of the variance-covariance matrix of the sample moment conditions.

A set of moment conditions for estimating B is

$E(vech(ut,ut′)-vech(BB′))=0$

for which the empirical counterpart is

$vech(1T∑t=1Tu^tu^t′-BB′)=vech(Σ˜u)-vech(BB′).$

Defining

$Ω^=1T∑t=1T(vech (u^tu^t′)-vech (u^u^′)¯)(vech (u^tu^t′)-vech (u^u^′)¯)′$

with

$vech (u^u^′)¯=1T∑t=1Tvech (u^tu^t′).$

the GMM estimator for B chooses the unknown elements of B so as to minimize

$J=T(vech(Σ^u)-vech (BB′))′Ω^-1(vech(Σ^u)-vech (BB′))$

subject to all identifying and over-identifying restrictions. Typically numerical algorithms have to be used for minimizing J subject to over-identifying restrictions.

5.2.2. Gaussian maximum likelihood estimation

As mentioned earlier, the MM estimator is equivalent to Gaussian ML estimation if Σ̃u is used as estimator for ∑u and the model is just-identified. For an over-identified B-model, concentrating on the structural parameters, the concentrated log-likelihood function is

$log lc(B)=constant-T2log(det B)2-T2tr(B-1′B-1Σ˜u).$

In general the concentrated log-likelihood function can be maximized by a numerical optimization algorithm with respect to B subject to the identifying restrictions.

If yt is a stationary Gaussian VAR(p) process, the ML estimation framework also facilitates the derivation of the asymptotic properties of the estimator. General ML theory implies that the unrestricted parameters are consistent and asymptotically normal. If there are over-identifying restrictions on B, the restricted ML estimator of ∑u is

$Σ˜ur≡B˜B˜′,$

where is the restricted ML estimator of B satisfying all identifying restrictions.

6. Estimation of the AB-model

### 6.1. Just-identified models

MM estimation can also be used for just-identified AB-models. The moment conditions are

$Σu=A-1BB′A-1′.$

Given identifying restrictions

$Q[vec(A)vec(B)]=q,$

the system of equations to be solved for A and B is

$[vech(AΣ˜uA′)Q˜[vec(A)vec(B)]]=[vech(BB′)q],$

where is obtained by replacing all reduced-form parameters in Q by their ML/LS estimates. Solving this system of equations for A and B typically requires the use of a nonlinear equations solver. Under normality assumptions and if reduced-form estimation is done by Gaussian ML, then the resulting estimates are also ML estimates. This result shows the good asymptotic properties of MM estimators under standard assumptions.

### 6.2. Over-identified models

For over-identified models, GMM or Gaussian ML estimation of the AB-model may be used.

6.2.1. Generalized method-of-moments estimation

GMM estimation of the AB-model is a straightforward extension of GMM estimation of the B-model. If there are identifying restrictions on both A and B in an over-identified structural VAR model, the GMM estimators are obtained by minimizing the objective function

$J=Tvech(Σ˜u-A-1BB′A-1′)′Ω^-1vech(Σ˜u-A-1BB′A-1′)$

subject to all restrictions on both A and B.

6.2.2. Gaussian maximum likelihood estimation

If there are no restrictions on the reduced-form parameters, Gaussian ML estimation is again executed by concentrating on the structural parameters. The relevant concentrated log-likelihood function is

$log lc(A,B)=constant+T2log(det A)2-T2log(det B)2-T2tr(A′B-1′B-1AΣ˜u)$

(see Lütkepohl, 2005, Chapter 9).

In general the concentrated log-likelihood function can be maximized by a numerical optimization algorithm with respect to A and B, subject to the identifying restrictions. For just-identified models it can be shown that the ML estimates satisfy

$A˜-1B˜B˜′A˜-1′=Σ˜u.$

In other words, ML estimation and MM estimation provide identical results if the same estimator for ∑u is used. If there are over-identifying restrictions on the structural parameters, the reduced-form error covariance matrix ∑u may be estimated as

$Σ˜ur≡A˜-1B˜B˜′A˜-1′.$
7. Bayesian estimation

In practice, structural VAR analysis is often based on Bayesian estimation methods. Bayesian estimators are obtained by evaluating the posterior distribution of the parameters of interest or functions of the structural parameters. The first step in constructing the posterior distribution of the structural parameters is the specification of a prior. In the framework of structural VAR analysis the prior is either imposed on the reduced-form parameters or on the structural parameters. The latter approach is more plausible if the structural parameters are of main interest and prior believes are available for the structural parameters. Priors imposed on the reduced-form parameters often reflect the limited structural knowledge of the investigator. They are formulated with the objective to obtain posterior distributions which are easy to sample from and to induce little distortion for the structural analysis.

Gaussian-inverse Wishart priors are conventional priors for reduced-form VAR models. The mean of the prior for the VAR coefficients is typically specified as for a so-called Minnesota prior which shrinks the VAR slope parameters to zero or a random walk process depending on the persistence properties of the data (Kilian and Lütkepohl, 2017, Chapter 5).

Suppose that the VAR(p) model (2.1) has Gaussian errors, ut ~ (0, ∑u), and the priors for A and ∑u are

$vec(A)∣Σu~N(vec(A*),VA=V⊗Σu)$

and

$Σu~ℐWK(S*,n),$

where ℐ (S*, n) denotes an inverse Wishart distribution with K × K matrix parameter S* and degrees-of-freedom parameter n. Using the covariance matrix Kronecker product V ⊗ ∑u in the conditional prior for A simplifies the posterior distribution which turns out to be also a Gaussian-inverse Wishart distribution,

$vec(A)∣Σu,y~N(vec(A¯),Σ¯A), Σu∣y~ℐWK(S,τ),$

where y = vec(y1, …, yT) denotes the data,

$A¯=(A*V-1+YZ′) (V-1+ZZ′)-1$

and

$Σ¯A=(V-1+ZZ′)-1⊗Σu.$

Here Z ≡ [Y0, …, YT1] and Y ≡ [y1, …, yT ]. The parameters of the inverse Wishart distribution in (7.3) are

$S=TΣ˜u+S*+A^ZZ′A^′+A*V-1A*′-A¯(V-1+ZZ′)A¯′$

and τ = T + n (Koop and Korobilis, 2010; Kilian and Lütkepohl, 2017, Section 5.2.4; Uhlig, 1994, 2005). Here A* and Ā are K × (Kp + 1) matrices and = YZ′(ZZ′)1.

Since the posterior is from the same distributional family as the prior, the prior (7.1)/(7.2) is a conjugate prior. Given that the prior is also from the same distributional family as the likelihood, it is even a natural conjugate prior. The question is, of course, how to choose the prior parameters A*, V, S* and n. As mentioned before, the prior mean A* is often chosen as in the Minnesota prior, i.e., A* is chosen such that the corresponding model for persistent variables is a random walk and for non-persistent variables a white noise process (see Kilian and Lütkepohl, 2017, Section 5.2.3, for the Minnesota and other priors). The variance related prior parameters V, S* and n are chosen such that the prior has a limited impact on the posterior. In other words, very large or even infinite variances are chosen. If infinite variances are chosen, V1 is replaced by a zero matrix in the posterior distributions and the prior is improper.

Using a prior that gives rise to a known form of the posterior distribution of the reduced-form parameters is convenient because it makes it easy to draw samples from the reduced-form posterior. For just-identified models, reduced-form posterior draws can then be transformed to draws of structural parameters (e.g., Canova, 1991; Gordon and Leeper, 1994). For example, for a recursive identification scheme, a Cholesky decomposition of the ith draw of ∑u yields the ith posterior draw for B.

Given that the prior in this approach is not specified for the structural parameters which are supposedly the parameters of interest, this approach is unsatisfactory. It is even more problematic if there are over-identifying restrictions for A or B because they would be ignored in a general purpose prior for the reduced form. Sims and Zha (1998, 1999) have developed an approach for imposing priors directly on the structural parameters. The approach is specifically designed for structural VAR models with linear restrictions on A. There are no obvious extensions for B-models. A generalization for certain nonlinear restrictions on A was recently proposed by Canova and Pérez Forero (2015).

Canova (2007, Section 10.3) compares the structural priors discussed in Sims and Zha (1998) with more traditional Minnesota priors on the reduced-form parameters and points out that there are important differences. Thus, imposing the prior on the structural parameters can make a difference for the posterior of the structural parameters.

8. Illustrations

In this section some illustrative examples for estimating structural VAR models are provided. The ISLM example data from Breitung et al. (2004) are used to illustrate the estimation of the structural parameters under alternative sets of restrictions. The data consists of three seasonally adjusted, quarterly U.S. time series for log real GDP (qt), the log of the real monetary base (mt) and a three-months inter-bank interest rate (rt) for the period 1970Q1–1997Q4. (The data is available at http://www.jmulti.de. Further details on the data sources are given by Breitung et al. (2004).) Thus, yt = (qt,mt, rt)′ is three-dimensional.

A VAR(4) model is fitted to the data, as in Breitung et al. (2004), and various alternative sets of identifying restrictions for the structural parameters are used. Some of these restrictions are not plausible from an economic point of view but are just chosen to illustrate specific identification restrictions and the related estimation methods for the structural parameters.

As discussed in the previous sections, the choice of estimation method depends on the type of model and restrictions. The frequentist estimation methods most suitable for specific model types and restrictions are summarized in Table 1. To illustrate the methods, the following three alternative A-models with A matrices

$[100*10**1], [100*1**01] and [10001**01]$

are considered. They all have diagonal structural covariance matrix

$Σw=[σ12000σ22000σ32].$

The asterisks denote unrestricted elements while 0 and 1 stand for restricted parameters. The first model is recursive. The second identification scheme is not recursive but just-identified and the third scheme is over-identified with just two structural parameters to be estimated in the A matrix.

Based on Table 1, MM via Cholesky decomposition is the recommended estimation method for the recursive model. The resulting estimates are presented in Table 2. Equivalently, the unknown elements of A can be estimated by applying LS to the two equations

$mt=ν2-a21qt+a2*Yt-1+w2t$

and

$rt=ν3-a31qt-a32mt+a3*Yt-1+w3t,$

where $ak*$ denotes the kth row of AA, as before.

LS is also the recommended method for estimating the nonrecursive just-identified model. In other words, LS is applied to the two equations

$mt=ν2-a21qt-a23rt+a2*Yt-1+w2t$

and

$rt=ν3-a31qt+a3*Yt-1+w3t.$

The resulting estimates are also presented in

Finally, 2SLS may be considered for estimating the second equation of the over-identified A-model,

$mt=ν2-a23rt+a2*Yt-1+w2t.$

The estimates are also given in Table 2, where the third equation of the over-identified model is estimated by LS.

To illustrate the estimation of B-models, three such models with B matrices

$[*00**0***], [**00*0***], [**00*0*0*]$

and ∑w = I3 are considered. The first model has a lower-triangular B matrix and is, hence, recursive. The second scheme belongs to a just-identified but nonrecursive model and the third model is over-identified. The recommended estimation method for the first model is by a Cholesky decomposition of the estimated reduced-form covariance matrix. The second model is easily estimated by MM estimation, that is, by a suitable decomposition of the estimated reduced-form residual covariance matrix and the third model can be estimated by Gaussian ML. The three resulting estimates are given in Table 3. They are all obtained by the software JMulTi which is a powerful tool for structural VAR estimation (Krätzig, 2004).

To illustrate an AB-model, I note that an A-model can be cast in AB form by specifying the B matrix as a diagonal matrix and standardizing the innovation covariance matrix to be an identity matrix, ∑w = IK. For example, the nonrecursive just-identified A-model can be parameterized as

$A=[100*1**01] and B=[*000*000*].$

MM estimation is a suitable method for this type of model (Table 1). It is equivalent to Gaussian ML. The estimates for the example model are given in

Finally, to illustrate the estimation of a structural VAR model in the presence of long-run restrictions, consider a B-model with lower-triangular long-run multiplier matrix Ξ,

$Ξ=A(1)-1B=[*00**0***].$

Note that, in practice, for the example model such restrictions are not recommended because the variables may be integrated in which case A(1) is not invertible (Breitung et al., 2004). However, for illustrative purposes, since the LS reduced-form estimate of A(1) is nonsingular, I proceed as if A(1) were nonsingular and use the long-run restrictions to identify B. The recursive restrictions suggest using an estimator based on the Cholesky decomposition of (1)1Σ̃u (1)1′. The resulting estimate of B is

$B^=A^(1)chol(A^(1)-1Σ˜uA^(1)-1′)=[-0.0018-0.00440.0054-0.00370.00390.00230.0101-0.00020.0034].$

This estimate was also computed with the software JMulTi.

9. Conclusions and extensions

This study reviews methods for estimating structural VAR models set up as A-models, B-models or AB-models. A range of alternative estimation methods are considered. It is stressed that the identifying restrictions should be placed such that easy estimation methods can be used. More precisely, it is useful if the restrictions can be imposed as linear restrictions on A or B or both of these matrices. The choice of estimation method depends on the type of model and restrictions.

Many estimators allow to concentrate out the reduced-form parameters and estimate the structural parameters from reduced-form residuals or the residual covariance matrix estimator directly. Gaussian ML and GMM are the most general estimation methods which can be used even if there are over-identifying restrictions on A or B. For the A-model the GMM estimator can be computed in closed form even in many situations when there are over-identifying restrictions. For just-identified models, MM estimation is often a suitable choice and for recursive models estimates can be computed using a Cholesky decomposition of a matrix based on reduced-form parameters.

It is also possible to estimate structural VAR models by Bayesian methods. The conventional approach of formulating a prior for the reduced-form parameters and generating posterior draws for the structural parameters A and/or B from the draws for ∑u has the drawback that it does not allow for over-identifying restrictions. So far Bayesian methods that impose priors directly on the structural parameters are primarily suitable for A-models.

There are a number of extensions of the basic model setup considered in this survey. First of all, linear identifying restrictions are emphasized because they facilitate estimation of the structural parameters. Some of the methods can in principle be adopted to nonlinear restrictions as well. For example, Gaussian ML and GMM estimation can be used in principle in conjunction with nonlinear restrictions. The computational challenges may become more burdensome for that case, however.

The basic model in this survey is a finite order VAR(p) model. Many of the methods can be justified even if the true VAR order is infinite and the finite order VAR model used for structural analysis is just an approximation to an infinite order process. The corresponding methods are discussed in the literature under the headline of sieve estimation (e.g., Kilian and Lütkepohl, 2017).

Another extension of the basic model setup allows for heteroskedasticity or conditional heteroskedasticity. Clearly, time-varying residual volatility is a common feature of financial data and therefore it also plays an important role in structural VAR analysis. Such features are in principle easy to deal with by using, for example, generalized LS methods rather than LS or by adjusting the likelihood function appropriately. Since the structural parameters are related to the residual covariance matrix, it is in fact possible to use a time-varying covariance structure for the identification of structural shocks. A broad literature addressing the topic of identification through heteroskedasticity has evolved following proposals by Rigobon (2003), Lanne and Lütkepohl (2008) and others. Recent surveys of the related literature are provided by Lütkepohl (2013) and Kilian and Lütkepohl (2017, Chapter 14).

This review focusses on point-identified structural VAR models. In practice set-identified models based on sign restrictions for the structural parameters or the derived impulse responses are often considered (see Uhlig (2005) for an important contribution to this literature and Kilian and Lütkepohl (2017, Chapter 13) for a recent survey). Although extensions of the setup discussed in this review to such models are not straightforward, some of the estimation algorithms presented in the current study are important building blocks of the related algorithms used in the sign restriction literature. Therefore being familiar with the methods discussed in the current study is useful.

TABLES

### Table 1

Structural restrictions and frequentist estimation methods

RestrictionsEstimation method
Recursive model (triangular A or B)Cholesky decomposition, MM
Just-identifying linear restrictions on ALS
Over-identifying linear restrictions on AIV, 2SLS, GMM
Just-identifying linear restrictions on BMM
Over-identifying linear restrictions on BGMM, ML
Just-identifying linear restrictions on A and BMM
Over-identifying linear restrictions on A and BGMM, ML

MM = method-of-moments; LS = least-squares; IV = instrumental variables; 2SLS = two-stage LS; GMM = generalized method-of-moments; ML = maximum likelihood.

### Table 2

Estimated A-models

ModelÂΣ̃w
Recursive$[100-0.045810-0.05020.88221]$$[0.46140000.29800000.7738]×10-4$
Nonrecursive$[100-0.048310.2614-0.009801]$$[0.46140000.22930001.0058]×10-4$
Over-identified$[100010.2612-0.009801]$$[0.46140000.23040001.0058]×10-4$

Computations were done with suitably adjusted Matlab code provided at http://www-personal.umich.edu/~lkilian/book.html.

### Table 3

Estimated B-models

 Recursive model $[0.0072000.00030.005800.0001-0.00510.0094]$ Nonrecursive model $[0.00720.0004000.005800.0004-0.00510.0094]$ Over-identified model $[0.00720.0006000.005800.000500.0107]$

Computations were done with JMulTi (see http://www.jmulti.de).

### Table 4

Estimated AB-model

Â
$[100-0.048310.2614-0.009801]$$[0.00680000.00480000.0100]$

Computations were done with suitably adjusted Matlab programs provided at http://www-personal.umich.edu/~lkilian/book.html.

References
1. Amisano, G, and Giannini, C (1997). Topics in Structural VAR Econometrics. Berlin: Springer
2. Blanchard, OJ, and Quah, D (1989). The dynamic effects of aggregate demand and supply disturbances. American Economic Review. 79, 655-673.
3. Breitung, J, Brüggemann, R, and Lütkepohl, H (2004). Structural vector autoregressive modeling and impulse responses. Applied Time Series Econometrics, Lütkepohl, H, and Krätzig, M, ed. Cambridge: Cambridge University Press, pp. 159-196
4. Canova, F (1991). The sources of financial crisis: pre- and post-Fed evidence. International Economic Review. 32, 689-713.
5. Canova, F (2007). Methods for Applied Macroeconomic Research. Princeton: Princeton University Press
6. Canova, F, and Pérez Forero, FJ (2015). Estimating overidentified, nonrecursive, time-varying coefficients structural vector autoregressions. Quantitative Economics. 6, 359-384.
7. Christiano, LJ, Eichenbaum, M, and Evans, C (1999). Monetary policy shocks: What have we learned and to what end?. Handbook of Macroeconomics, Taylor, JB, and Woodford, M, ed. Amsterdam: Elsevier, pp. 65-148
8. EViews (2000). EVievs 4.0 User’s Guide. Irvine: Quantitative Micro Software
9. Gordon, DB, and Leeper, EM (1994). The dynamic impacts of monetary policy: an exercise in tentative identification. Journal of Political Economy. 102, 1228-1247.
10. Johansen, S (1995). Likelihood-based Inference in Cointegrated Vector Autoregressive Models. Oxford: Oxford University Press
11. Kilian, L, and Lütkepohl, H (2017). Structural Vector Autoregressive Analysis. Cambridge: Cambridge University Press
12. King, RG, Plosser, CI, Stock, JH, and Watson, MW (1991). Stochastic trends and economic fluctuations. American Economic Review. 81, 819-840.
13. Koop, G, and Korobilis, D (2010). Bayesian multivariate time series methods for empirical macroeconomics. Foundations and Trends in Econometrics. 3, 267-358.
14. Krätzig, M (2004). The software JMulTi. Applied Time Series Econometrics, Lütkepohl, H, and Krätzig, M, ed. Cambridge: Cambridge University Press, pp. 289-299
15. Lanne, M, and Lütkepohl, H (2008). Identifying monetary policy shocks via changes in volatility. Journal of Money, Credit and Banking. 40, 1131-1149.
16. Lütkepohl, H (2005). New Introduction to Multiple Time Series Analysis. Berlin: Springer-Verlag
17. Lütkepohl, H (2013). Identifying structural vector autoregressions via changes in volatility. Advances in Econometrics, Fomby, TB, Kilian, L, and Murphy, A, ed. Greenwich: JAI Press, pp. 169-204
18. Lütkepohl, H, Staszewska-Bystrova, A, and Winker, P (2017). Estimation of structural impulse responses: short-run versus long-run identifying restrictions. Advances in Statistical Analysis, 1-16.Retrieved from: https://link.springer.com/article/10.1007/s10182-017-0300-9
19. Pagan, AR, and Pesaran, MH (2008). Econometric analysis of structural systems with permanent and transitory shocks. Journal of Economic Dynamics and Control. 32, 3376-3395.
20. Rigobon, R (2003). Identification through heteroskedasticity. Review of Economics and Statistics. 85, 777-792.
21. Rubio-Ramírez, JF, Waggoner, D, and Zha, T (2010). Structural vector autoregressions: theory of identification and algorithms for inference. Review of Economic Studies. 77, 665-696.
22. Sims, CA (1980). Macroeconomics and reality. Econometrica. 48, 1-48.
23. Sims, CA, and Zha, T (1998). Bayesian methods for dynamic multivariate models. International Economic Review. 39, 949-968.
24. Sims, CA, and Zha, T (1999). Error bands for impulse responses. Econometrica. 67, 1113-1155.
25. Uhlig, H (1994). What macroeconomists should know about unit roots: a Bayesian perspective. Econometric Theory. 10, 645-671.
26. Uhlig, H (2005). What are the effects of monetary policy on output? Results from an agnostic identification procedure. Journal of Monetary Economics. 52, 381-419.