TEXT SIZE

search for



CrossRef (0)
Application of covariance adjustment to seemingly unrelated multivariate regressions
Communications for Statistical Applications and Methods 2018;25:577-590
Published online December 1, 2018
© 2018 Korean Statistical Society.

Lichun Wang1,a, and Lawrence Pettitb

aDepartment of Mathematics, Beijing Jiaotong University, China, bSchool of Mathematical Sciences, Queen Mary University of London, UK
Correspondence to: 1Department of Mathematics, Beijing Jiaotong University, No.3 Shangyuancun Haidian District, Beijing 100044, China. E-mail: lchwang@bjtu.edu.cn
Received February 27, 2018; Revised August 24, 2018; Accepted September 21, 2018.
 Abstract

Employing the covariance adjustment technique, we show that in the system of two seemingly unrelated multivariate regressions the estimator of regression coefficients can be expressed as a matrix power series, and conclude that the matrix series only has a unique simpler form. In the case that the covariance matrix of the system is unknown, we define a two-stage estimator for the regression coefficients which is shown to be unique and unbiased. Numerical simulations are also presented to illustrate its superiority over the ordinary least square estimator. Also, as an example we apply our results to the seemingly unrelated growth curve models.

Keywords : seemingly unrelated regressions, matrix power series, two-stage estimator
1. Introduction

The system of seemingly unrelated regressions (SUR) has been investigated by many authors since the pioneering works of Zellner (1962, 1963), which can be used to model subtle interactions among individual statistical relationships. For more details, the readers are referred to Revankar (1974), Schmidt (1977), Wang (1989), Percy (1992), Liu (2000), Liu (2002). Among these, the cases of orthogonal regressors (Zeller, 1963) and triangular SUR models (Revankar, 1974) and an SUR with unequal numbers of observations (Schmidt, 1977) are more impressive. Some examples in the econometrics literature (Srivastava and Giles, 1987) suggest that the SUR model is appropriate and useful for a wide range of applications. Further, Velu and Richards (2008) focuses on some applications of reduced-rank model in the context of SUR. Alkhamisi (2010) proposes two SUR type estimators based on combining the SUR ridge regression and the restricted least squares methods as well as evaluates their performances by means of some designated criteria. Zhou et al. (2011) also employs seemingly unrelated nonparametric regression models to fit the multivariate panel data. Shukur and Zeebari (2012) considers median regression for SUR models with the same explanatory variables and obtains an interesting feature of the generalized least absolute deviations method. However, this paper will show some interesting facts about the SUR system by employing the covariance adjustment technique. We start from the system of seemingly unrelated multivariate regressions (Gupta and Kabe, 1998), namely

{Y1=X1B1+E1,Y2=X2B2+E2,

where Yi (i = 1, 2) are n × q observation variables; Xi (i = 1, 2) are n × pi matrices with full column rank; Bi (i = 1, 2) are pi × q unknown regression coefficients; E1 and E2 are random error matrices and the row variable of (E1, E2) follow a common unspecified multivariate distribution with mean zero and covariance matrix V, where V is a 2 × 2 non-diagonal partitioned matrix and given by

V=(V1DDTV2),

where Vi is the variance-covariance matrix of the row variable of Ei (i = 1, 2) and D denotes the covariance matrix between the row variable of E1 and the corresponding row variable of E2. The different rows of (E1, E2) are also assumed to be uncorrelated. The case of multivariate SUR is common in biological science. For instance, if the ith row of Y1 denotes the observation vector of the weight of the ith rabbit at q different time points and the ith row of Y2 denotes the observation vector of the length of the ith rabbit at the same q time points, and the observation values of different rabbits are uncorrelated, then the multivariate SUR (1.1) reasonably model the interactions among the observation vectors of weight and length of n rabbits.

If one neglects the correlation between Y1 and Y2, i.e., taking D as zero, then only by the first equation of the system (1.1), one would obtain the least square estimator (LSE) for Vec(B1) as

Vec^(B1)=(Iq(X1TX1)-1X1T)Vec(Y1),

and correspondingly, the LSE of the coefficients matrix B1 is B^1=(X1TX1)-1X1TY1, where Vec(A) denotes the direct operator of matrix A, ⊗, and Iq are the Kronecker product operator and the identity matrix of q order, respectively.

However, if we denote Y = (Y1, Y2), B = (B1, B2), and E = (E1, E2), then the system (1.1) can also be represented as:

Vec(Y)=(IqX100IqX2)Vec(B)+Vec(E).

Hence, from (1.4), one can obtain the LSE of Vec(B), say Vec¯(B), and accordingly another estimator for Vec(B1), denoted by Vec¯(B1), can be proposed since Vec¯(B)T=(Vec¯(B1)T,Vec¯(B2)T). We think it makes sense that Vec¯(B1) and its corresponding two-stage estimator version Vec¯(B1)2-stage (in case of unknown V) should outperform Vec^(B1) (1.3) since they take the another equation information on B1 into account.

The covariance adjustment technique is usually employed to obtain an optimal unbiased estimator of a vector parameter θ via linearly combining an unbiased estimator of θ, say T1, and an unbiased estimator of a zero vector, say T2 (Rao, 1967; Baksalary, 1991).

Applying the covariance adjustment technique to the estimator Vec^(B1), which only uses the first equation information on Vec(B1), we firstly use (IqN2)Vec(Y2) to improve Vec^(B1) noting E[(IqN2)Vec(Y2)] = (IqN2)(IqX2)Vec(B2) = 0 and obtain Vec^(B1)(1), secondly we again improve Vec^(B1)(1) by (IqN1)Vec(Y1) due to E[(IqN1)Vec(Y1)] = 0 and get Vec^(B1)(2). Repeating this process, we obtain the following estimator sequence (k ≥ 1) for Vec(B1):

Vec^(B1)(2k-1)=Vec^(B1)(2k-2)-Cov(Vec^(B1)(2k-2),(IqN2)Vec(Y2))×[Cov((IqN2)Vec(Y2))]-(IqN2)Vec(Y2),Vec^(B1)(2k)=Vec^(B1)(2k-1)-Cov(Vec^(B1)(2k-1),(IqN1)Vec(Y1))×[Cov((IqN1)Vec(Y1))]-(IqN1)Vec(Y1),

where Vec^(B1)(0)=Vec^(B1),Ni=In-Xi(XiTXi)-1XiT(i=1,2), and A denotes any a generalized inverse matrix of A.

Note that Cov(Vec(Yi),Vec(Yi)) = ViIn (i = 1, 2) and Cov(Vec(Y1),Vec(Y2)) = DIn. By some algebra computations, we obtain that for k ≥ 1

Vec^(B1)(2k-1)=[Iq(X1TX1)-1X1T]i=0k-1(DV2-1DTV1-1N2N1)i×[Vec(Y1)-(DV2-1N2)Vec(Y2)],Vec^(B1)(2k)=[Iq(X1TX1)-1X1T]i=0k(DV2-1DTV1-1N2N1)iVec(Y1)-[Iq(X1TX1)-1X1T]i=0k-1(DV2-1DTV1-1N2N1)i(DV2-1N2)×Vec(Y2).

Denote V-1=(V11V12V21V22) and Q = (V11)−1V12(V22)−1V21. By (1.2) and the inverse of partitioned matrix, we have

Q=(V11)-1V12(V22)-1V21=[V1-DV2-1DT]·[V1-DV2-1DT]-1DV2-1·(V2-DTV1-1D)·V2-1DT[V1-DV2-1DT]-1=DV2-1DT[(V1-DV2-1DT)-1-V1-1DV2-1DT(V1-DV2-1DT)-1]=DV2-1DTV1-1,

and (V11)-1V12=-DV2-1. Thus we have

Vec¯(B1)=Vec^(B1)()=limkVec^(B1)(2k-1)=limkVec^(B1)(2k)=[Iq(X1TX1)-1X1T]i=0(QN2N1)i×[Vec(Y1)+((V11)-1V12N2)Vec(Y2)]=[Iq(X1TX1)-1X1T]{Iqn-Qi=0(QP2P1)iP2N1}×{Vec(Y1)+[(V11)-1V12N2]Vec(Y2)},

where Pi=In-Ni=Xi(XiTXi)-1XiT and we use the facts that (QP2P1)0 = IqIn = Iqn, (V11)−1(QT )kV11 = Qk for k ≥ 0 and X1T(N2N1)k=-X1T(P2P1)k-1P2N1 for k ≥ 1.

Then, we integrate the above conclusions into the following theorem, which indicates the limit of the covariance adjustment sequence and the covariance of Vec¯(B1).

Theorem 1

For the system (1.1), the limit of the covariance adjustment sequence of Vec(B1) equals toVec¯(B1), i.e.,limkVec^(B1)(k)=Vec¯(B1), and

Cov(Vec¯(B1))=V1(X1TX1)-1-[Iq(X1TX1)-1X1T]·G·[IqX1(X1TX1)-1],

whereG=i=0[QiDV2-1DT][(P1P2P1)i-(P1P2P1)i+1].

Proof

The first conclusion follows from the above discussion. Denote Vec¯(B1)=M(Q){Vec(Y1)+[(V11)-1V12N2]Vec(Y2)} with M(Q)=[Iq(X1TX1)-1X1T]{Iqn-Qi=0(QP2P1)iP2N1}, we have

Cov(Vec¯(B1))=M(Q)[V1In+(V11)-1V12V2V21(V11)-1N2+DV21(V11)-1N2+(V11)-1V12DTN2]MT(Q)=M(Q)[V1In-QV1N2]MT(Q),

where we use the following fact

(V11)-1V12V2V21(V11)-1+DV21(V11)-1+(V11)-1V12DT=-DV2-1DT.

Together with the expression of M(Q), we have

Cov(Vec¯(B1))={Iq(X1TX1)-1X1T-i=0[Qi+1(X1TX1)-1X1T(P2P1)iP2N1]}×{V1X1(X1TX1)-1-i=0[V1(QT)i+1N1P2(P1P2)iX1(X1TX1)-1]}-{Iq(X1TX1)-1X1T-i=0[Qi+1(X1TX1)-1X1T(P2P1)iP2N1]}×{QV1N2X1(X1TX1)-1-i=0[QV1(QT)i+1N2N1P2(P1P2)iX1(X1TX1)-1]}=V1(X1TX1)-1-[Iq(X1TX1)-1X1T][QV1N2][IqX1(X1TX1)-1]+[Iq(X1TX1)-1X1T][i=0Qi+2V1(P2P1)iP2N1N2][IqX1(X1TX1)-1]+[Iq(X1TX1)-1X1T][i=0QV1(QT)i+1N2N1P2(P1P2)i][IqX1(X1TX1)-1]+[Iq(X1TX1)-1X1T][i=0Qi+1(P2P1)iP2N1i=0V1(QT)i+1N1P2(P1P2)i][IqX1(X1TX1)-1]-[Iq(X1TX1)-1X1T][i=0Qi+1(P2P1)iP2N1i=0QV1(QT)i+1N2N1P2(P1P2)i][IqX1(X1TX1)-1]=V1(X1TX1)-1-[Iq(X1TX1)-1X1T][QV1N2][IqX1(X1TX1)-1]+[Iq(X1TX1)-1X1T][i=0QV1(QT)i+1P2(P1P2)i+1-i=0Qi+2V1(P2P1)i+1]×[IqX1(X1TX1)-1].

Using QV1=DV2-1DT,X1TN2X1=X1T(In-P1P2P1)X1, and X1TP1=X1T, we have

Cov(Vec¯(B1))=V1(X1TX1)-1-[Iq(X1TX1)-1X1T][DV2DT(In-P1P2P1)][IqX1(X1TX1)-1]-[Iq(X1TX1)-1X1T]{i=1QiDV2-1DT(P1P2P1)i-i=1DV2-1DT(QT)i(P1P2P1)i+1}×[IqX1(X1TX1)-1]=V1(X1TX1)-1-[Iq(X1TX1)-1X1T]{i=0QiDV2-1DT(P1P2P1)ii=0DV2-1DT(QT)i(P1P2P1)i+1}×[IqX1(X1TX1)-1]=V1(X1TX1)-1-[Iq(X1TX1)-1X1T]{i=0[QiDV2-1DT][(P1P2P1)i-(P1P2P1)i+1]}×[IqX1(X1TX1)-1],

where the last step uses the facts that (P1P2P1)0 = In and QiDV2-1DT=DV2-1DT(QT)i for i ≥ 0.

The proof of Theorem 1 is finished.

Note that Q0DV2-1DT=DV2-1DT0, InP1P2P1 ≥ 0 and for i ≥ 1

QiDV2-1DT=DV2-1DT(QT)i={DV2-1DT(V1-1DV2-1DT)k-1V1-1(DV2-1DTV1-1)k-1DV2-1DT0,i=2k-1,DV2-1DT(V1-1DV2-1DT)k-1V1-1DV2-1DTV1-1(DV2-1DTV1-1)k-1DV2-1DT0,i=2k,k=1,2,,

and (P1P2P1)i – (P1P2P1)i+1 ≥ 0. Hence

G=i=0[QiDV2-1DT][(P1P2P1)i-(P1P2P1)i+1]0.

Further, since Cov(Vec^(B1))=V1(X1TX1)-1, we have

Cov(Vec¯(B1))=Cov(Vec^(B1))-[Iq(X1TX1)-1X1T]·G·[IqX1(X1TX1)-1]Cov(Vec^(B1)),

which means Vec¯(B1) is superior to Vec^(B1) in the sense of having less covariance. This result is exactly consistent with the fact that Vec^(B1) only uses the first regression information on Vec(B1), whereas Vec¯(B1) combines the second regression equation with the first one via covariance adjustment.

2. The characteristics of matrix series

Note that for i = 1, 2, . . . ,

X1T(P2P1)i-1P2N1=0X1T(P2P1)i-1P2N1N2=0.

We only need to prove that the right equality implies the left equality. Note that X1T(P2P1)i-1P2N1N2=0 concludes X1T(P2P1)i-1N2N1N2=0, hence one has X1T(P2P1)i-1N2N1N2(P1P2)i-1X1=0, thus X1T(P2P1)i-1N2N1=0, where we use N12=N1. Further, replace N2 by InP2 and note that X1TN1=0 and P1N1 = 0, we have X1T(P2P1)i-1P2N1=0.

Therefore, (2.1) implies that for i = 1, 2, . . . ,

(X1TX1)-1X1T(P2P1)i-1P2N1=0(X1TX1)-1X1T(P2P1)i-1P2N1N2=0,

which further shows that for i = 1, 2, . . . ,

Qi(X1TX1)-1X1T(P2P1)i-1P2N1=0Qi(V11)-1V12(X1TX1)-1X1T(P2P1)i-1P2N1N2=0,

where we note that Q=DV2-1DTV1-1 and Qi(V11)-1V12=-QiDV2-1 and D is the covariance matrix of E1 and E2, and that both Q and Qi(V11)−1V12 are invertible.

Set

Vec¯(B1)s=[Iq(X1TX1)-1X1T]Vec(Y1)+[(V11)-1V12(X1TX1)-1X1TN2]Vec(Y2).

The following theorem shows that the matrix series (1.10) only have one degeneration form Vec¯(B1)s.

Theorem 2

Vec¯(B1)sis the unique simpler form ofVec^(B1)().

Proof

Note that for any a fixed i (i ≥ 1) that: if X1T(P2P1)i-1P2N1=0, then X1T(P2P1)iP2N1=X1T(P2P1)i-1P2(In-N1)P2N1=0. Step by step, we come to

X1T(P2P1)k-1P2N1=0,         k=i+1,i+2,....

Thus, we find

Qi(X1TX1)-1X1T(P2P1)i-1P2N1=0,         for any a fixed i(i1)Qk(X1TX1)-1X1T(P2P1)k-1P2N1=0,         k=i+1,i+2,....

On the other hand, if for any a fixed i (i ≥ 2) one has X1T(P2P1)i-1P2N1=0, then it is easy to see that

X1T(P2P1)i-1(In-N2)N1=0X1T(P2P1)i-1N2N1=0X1T(P2P1)i-2P2(In-N1)N2N1=0X1T(P2P1)i-2P2N1N2N1=0X1T(P2P1)i-2P2N1N2N1P2(P1P2)i-2X1=0X1T(P2P1)i-2P2N1N2=0X1T(P2P1)i-2P2N1=0,

where the last step comes from the fact (2.1). Thus, step by step we conclude that

Qi(X1TX1)-1X1T(P2P1)i-1P2N1=0,         for any a fixed i(i2)Qk(X1TX1)-1X1T(P2P1)k-1P2N1=0,         k=1,2,,i-1.

Combining (2.6) with (2.8), we know that for any a fixed i (i ≥ 1) if

Qi(X1TX1)-1X1T(P2P1)i-1P2N1=0,

then the infinite series

i=1[Qi(X1TX1)-1X1T(P2P1)i-1P2N1]=0,

and by (2.3), concurrently we conclude that the infinite series

i=1[Qi(V11)-1V12(X1TX1)-1X1T(P2P1)i-1P2N1N2]=0.

Hence, Vec^(B1)() has unique simpler form Vec¯(B1)s in the sense that if one term in (2.10) or (2.11) is zero, then both infinite sums turn into zero.

The proof of Theorem 2 is finished.

3. The properties of two-stage estimator

If the covariance matrix V is unknown, then both Vec^(B1)() and the simpler form Vec¯(B1)s are not available to use. Set = (X1, X2), we estimate V by

V^=1n-R(X˜)(Y1TY2T)(In-PX˜)(Y1,Y2),

where R( ) is the rank of and P = (T )T .

Following from E(aT Ab) = trace[ACov(b, a)] + (Ea)T A(Eb) and (InP )Xi = 0(i = 1, 2), where a and b denote two random vectors, we have E[YiT(In-PX˜)Yi]=Vi[n-R(X˜)](i=1,2) and E[Y1T(In-PX˜)Y2]=D[n-R(X˜)], which show that

EV^=(V1DDTV2)=V.

Substituting the estimator for V in the expressions of Vec^(B1)() and Vec¯(B1)s , we obtain the following two two-stage estimators

Vec^(B1)2-stage()=M(Q^){Vec(Y1)+[-D^V^2-1N2]Vec(Y2)}

with M(Q^)=[Iq(X1TX1)-1X1T]{Iqn-Q^i=0(Q^P2P1)iP2N1} and Q^=D^V^2-1D^TV^1-1, and

Vec¯(B1)s,2-stage=[Iq(X1TX1)-1X1T]Vec(Y1)+[-D^V^2-1(X1TX1)-1X1TN2]Vec(Y2).

Similar to Theorem 2, we know that Vec¯(B1)s,2-stage is the unique simpler form of Vec^(B1)2-stage(). Hence, we focus on the performances of Vec¯(B1)s,2-stage.

The matrix-variate normal distribution is a commonly used distribution in the class of matrix elliptically symmetric distributions. It plays an important role in the investigation of multivariate regression models such as the growth curve model (GCM). In what follows, in order to establish the unbiasedness of Vec¯(B1)s,2-stage, we first briefly present the definition of the matrix-variate normal distribution as well as two related properties and then make some assumptions on the distributions of random error matrices Ei (i = 1, 2).

Definition 1

A random matrix Z with order n×q is said to follow a matrix-variate normal distribution if its probability function is of the form

f(Z)=(2π)-nq2[det(Σ)]-q2[det(Ω)]-n2exp (-12trace{Ω-1[Z-M]TΣ-1[Z-M]}),

where M, ∑ > 0, and Ω > 0 are n × q, n × n, and q × q matrices, respectively, and det(A) is the determinant of the square matrix A. In this case, it is usually denoted that Z ~ Nn,q(M, ∑,Ω).

The following two lemmas point out that the relationship between the matrix-variate and vectorvariate normal distributions and an affine transformation of a matrix-variate normal variable also follows a matrix-variate normal distribution. The readers are referred to the first chapter of Pan and Fang (2007) for more details.

Lemma 1

Let Z be a n × q random matrix and z = Vec(Z). Then Z ~ Nn,q(M, ∑,Ω) if z ~ Nnq (Vec(M),Ω ⊗ ∑).

Lemma 2

Suppose Z ~ Nn,q(M, ∑,Ω), and that C, A1 > 0, and A2 > 0 are given matrices with orders n × q, n × n, and q × q, respectively. ThenA1ZA2+C~Nn,q(A1MA2+C,A1ΣA1T,A2ΩA2T).

In the following, we assume that in the system (1.1) the random error matrices Ei (i = 1, 2) follow the matrix-variate normal distribution Nn,q(0, In, Vi), which indicate that the rows of Ei are iid random vectors with common distribution Nq(0, Vi) (i = 1, 2), respectively. Thus, the rows of E = (E1, E2) are iid random vectors with common distribution N2q(0, V), i.e., E ~ Nn,2q(0, In, V). Hence, by Lemmas 1 and 2 we know that

Vec(Y)=Vec(Y1,Y2)~N2nq([IqX100IqX2]Vec(B),VIn).

Denote Yi=(y1(i),y2(i),,yq(i))(i=1,2). Then the matrix = [nR()]−1(i j)q×q with the element

d^ij=(yi(1))T[In-PX˜]yj(2)=(Vec(Y))T[Oi,q+j(2q×2q)(In-PX˜)]Vec(Y),

where the matrix Oi,q+j(2q × 2q) with order 2q × 2q consists of all zeros only except the element in the ith row and the (q + j)th column is one. Similarly, the (i, j)th element of 2 is equal to

(Vec(Y))T[Oq+i,q+j(2q×2q)(In-PX˜)]Vec(Y),

where the 2q × 2q order matrix Oq+i,q+j(2q × 2q) consists of all zeros only; except the element in the (q + i)th row and the (q + j)th column is one.

Note that [Iq(X1TX1)-1X1TN2]Vec(Y2)=[0qp1×nq,Iq(X1TX1)-1X1TN2]Vec(Y). Hence, using X1TN2[In-PX˜]=0 and following from the discriminant condition of independence of the linear function and quadratic function of normal variables and the following easily verified facts:

[0qp1×nq,Iq(X1TX1)-1X1TN2][VIn][Oi,q+j(2q×2q)(In-PX˜)]=0,

and

[0qp1×nq,Iq(X1TX1)-1X1TN2][VIn][Oq+i,q+j(2q×2q)(In-PX˜)]=0.

We know that

E[Vec¯(B1)s,2-stage]=[Iq(X1TX1)-1X1T](IqX1)Vec(B1)+[(E(-D^V^2-1))(X1TX1)-1X1TN2](IqX2)Vec(B2)=Vec(B1).

Thus, we obtain the following theorem, which states the unbiasedness of the two-stage estimator.

Theorem 3

Under the assumptions that Ei ~ Nn,q(0, In, Vi) (i = 1, 2), the two-stage estimatorVec¯(B1)s,2-stageis unbiased, i.e.,E[Vec¯(B1)s,2-stage]=Vec(B1).

In the following, we refer to Grunfeld’s data in Maddala (1977) and present two simulation studies to compare the performances of Vec¯(B1)s,2-stage with those of Vec^(B1) under the conditions that there are some known relationships between the design matrices X1 and X2 and no relationships between X1 and X2, respectively.

(I) The case that X1 = (X2, L)

Where the system (1.1) is of the form Yi = XiBi + Ei (i = 1, 2) with E = (E1, E2) ~ Nn,4(0, In, V), and

B1=(111213),         B2=(16-32),         V=(10ρ0010ρρ0100ρ01).

Set S (B1) = (Y1X1B1)T (Y1X1B1). Note that the estimator B^1=(X1TX1)-1X1TY1 given by (1.3), which corresponds to the LSE Vec^(B1), actually makes the residual sum of squares (in the sense of nonnegative definite), trace of S (B1), determinant of S (B1) and the largest eigenvalue of S (B1) achieve their minimums (Muirhead, 1982). Therefore, under the four different criteria of measurement, if only the first equation Y1 = X1B1 + E1 is used then the LSE of the regression coefficient B1 are completely identical (Fang and Zhang, 1990). Thus, without loss of generality, we illustrate the superiorities of Vec¯(B1)s,2-stage by comparing trace(S (1)) with trace(S (1,s,2-stage)), where B¯1,s,2-stage=(X1TX1)-1X1TY1+(X1TX1)-1X1TN2Y2(-D^V^2-1)T, which corresponds to Vec¯(B1)s,2-stage. We also present the values of trace(S (1,s)) for contrast, where B¯1,s=(X1TX1)-1X1TY1+(X1TX1)-1X1TN2Y2(-DV2-1)T corresponds to (2.4).

In Table 1, based on different combinations of the correlation ρ and sample size, we present some numerical demonstrations to compare trace(S (1,s,2-stage)) with trace(S (1)) and trace(S (1,s)), which exhibit the performances of the simplified two-stage estimator Vec¯(B1)s,2-stage when the sample size is relatively small and moderate. Consequently, we find that the performance of the two-stage estimator tends to improve as the sample size increases. However, it also depends on the correlation ρ, and especially when n ≥ 20 and ρ ≥ 0.5, we easily see that |trace(S (1,s,2-stage)) – trace(S (1,s))| < |trace(S (1)) – trace(S (1,s))|, which shows that the two-stage estimator Vec¯(B1)s,2-stage is closer to Vec¯(B1)s.

(II) The case that there are no relationships between X1 and X2

In this case we assume that the system (1.1) has the same form as (3.10) but there are no relationships between X1 and X2. The simulations are presented below. From Table 2, we see that trace(S (1,s,2-stage)) is getting closer to trace(S (1,s)), which implies that the two-stage estimator Vec¯(B1)s,2-stage is becoming better than the LSE Vec^(B1) as the sample size goes large (n ≥ 20 or larger), also the fact depends on the value of the correlation ρ (≥ 0.5). This is because that from the viewpoint of covariance adjustment the one-step covariance adjustment estimator Vec^(B1)(1), which is exactly equal to Vec¯(B1)s, is superior to Vec^(B1) in the sense of having less covariance even though there are no relationships between X1 and X2. Hence, the simulation study discloses the tendency of Vec¯(B1)s,2-stage performing better, which is consistent with a two-stage estimator that incorporates more information.

4. An illustrating example

The GCM is a generalized multivariate analysis-of-variance model, which is useful especially for investigating growth problems on short time series in economics, biology and medical research (see Lee and Geisser 1972, Pan and Fang 2007). The seemingly unrelated GCMs are defined as

{Y1=X1B1Z1+E1,Y2=X2B2Z2+E2,

where Yi are n × q observation matrices, Xi and Zi are known design matrices of full column rank and full row rank, respectively, and the regression parameters B1 and B2 are unknown. The assumptions on E1 and E2 are the same as those in the system (1.1).

Therefore, without considering the interactions between the two equations, we obtain the LSE of B1 from the first equation as

B^1=(X1TX1)-1X1TY1V1-1Z1T(Z1V1-1Z1T)-1,

which is unbiased and the corresponding covariance Cov(B^1)=Cov(Vec(B^1))=(Z1V1-1Z1T)-1(X1TX1)-1. However, combining the information of the second equation and the assumption X1TX2=0, we obtain the system LSE for B1 as

B¯1=(X1TX1)-1X1T(Y1V11+Y2V21)Z1T(Z1V11Z1T)-1,

which is unbiased and with less covariance

Cov(B¯1)=Cov(Vec(B¯1))=(Z1V11Z1T)-1(X1TX1)-1,

which is less than Cov(1) since V1-1(V1-DV2-1DT)-1=V11 and correspondingly (Z1V1-1Z1T)-1(Z1V11Z1T)-1.

In the case that the covariance matrix V is unknown, under the assumption that E = (E1, E2) ~ Nn,2q(0, In, V), we use the same form estimator as that of the equation (3.1) to estimate V, which is easily shown to be unbiased. Hence, a two-stage estimator for B1 is defined as

B¯1,2-stage=(X1TX1)-1X1T(Y1V^11+Y2V^21)Z1T(Z1V^11Z1T)-1,

where V^11=(V^1-D^V^2-1D^T)-1 and V^21=-V^2-1D^TV^11. Analogous to the previous discussions, we can establish the unbiasedness of the estimator 1,2-stage.

In the following, we illustrate a simulation study to compare the performances of 1,2-stage with those of 1 under the matrix 2-norm criterion, where the 2-norm of a matrix A is given by A2=Vec(A)2=(ijaij2)1/2. The performances of 1 are also presented as a contrast. In each simulation, a sample of size n observations is randomly generated from a 2q-variate normal distribution with mean zero and covariance matrix V, which is considered as the error matrix En×2q = (E1, E2). Next, 1, 1,2-stage, and 1 are calculated in each simulation. Simulations are repeated 500 times and the matrix 2-norms of the average values of 1B1, 1,2-stageB1, and 1B1 are given in Table 3.

Three cases are studied. The first of them corresponds to n = 10, the second one considers the case of n = 20 and the third one corresponds to the case of n = 50. All cases adopt the same V as (3.10), but with the correlation ρ having a number of alternative values.

Simulations for the case (i) with

X1T=(616-30219231925261711614-3022715202662425837-28125-4-2914),X2T=(14256352582634216853),B1=(111213),         B2=(16-32),         Z1=(1234),         Z2=(1560.5).

Simulations for the case (ii) with X1T being

(-94815-26545253510304502051570657560906020-106930-34071580553585704025957551020906545100-176660-27131001070755065904045525855535308015)

and

X2T=(123456789101112131415161718192024168712484185394731162),

where B1, B2, Z1, and Z2 are the same as the case (i).

Simulations for the case (iii) with X1 = [a1, a2, a3]50×3 being randomly generated and X2 = [a4, a5]50×2 being obtained from the null space of X1T, and in this case B1, B2, Z1, and Z2 remain the same as those of the case (i).

From Table 3, except the situations that ρ = 0.2 and ρ = 0.5 with n = 10, we find that norm(1,2-stageB1) is uniformly smaller than norm(1B1), which shows that the two-stage estimator 1,2-stage is closer to the true value B1 than the LSE 1.

5. Concluding remarks

In summary, we have investigated regression coefficients estimation and inference for the system of two multivariate SURs. Note that we focus on the estimation problem of B1 since the positions of B1 and B2 are equipotent. In Section 1, we find that together with another equation information the estimator of regression coefficients can be presented as a matrix power series via the method of covariance adjustment. In Section 2, we further indicate that the matrix series has exactly one simpler form which is just the one-step covariance adjustment estimator of the regression coefficients. In Section 3, in the case that the covariance matrix of the system is unknown, we illustrate that the degeneration form of the two-stage estimator sequence is unique, and an unbiased two-stage estimator is proposed and numerical simulations are also presented to verify its superiority. The results established in the present paper enrich the existing results since they include Zellner’s univariate SURs as a special case.

TABLES

Table 1

Comparisons between the two-stage estimator and the least square estimator

ρntrace(S(1))trace(S(1,s,2-stage))trace(S(1,s))
0.21012.711812.877512.7718
2030.045630.215530.0873
50141.1215141.9674141.3323

0.51015.356115.436815.5200
2035.358935.374735.3775
50108.2523108.2895108.2894

0.7107.70188.00207.9504
2042.437545.815344.6528
50103.9155104.3501104.2045

0.91018.605518.675818.6689
2035.663037.556337.2077
50119.9993122.4540122.2700

Table 2

Comparisons between the two-stage estimator and the least square estimator

ρntrace(S(1))trace(S(1,s,2-stage))trace(S(1,s))
0.21016.482216.736316.5106
2045.156745.301345.1656
5090.684890.747591.0296

0.51011.797512.433412.0509
2030.693131.471731.2451
5091.111291.178291.2101

0.71015.087715.427515.2887
2034.031634.926234.6478
5078.842479.197979.4065

0.9107.74168.83349.0277
2037.163939.629338.6530
50106.1452106.7684106.7741

Table 3

Comparisons between several estimators under the matrix 2 norm

ρnnorm(1B1)norm(1,2-stageB1)norm(1B1)
0.2100.19310.26680.1879
200.01830.01930.0182
500.22590.22660.2205

0.5100.20130.22830.1679
200.01860.01770.0171
500.25570.22710.2198

0.7100.18960.18720.1383
200.01840.01340.0127
500.25320.18200.1763

0.9100.19230.11380.0847
200.01730.00850.0079
500.24710.11330.1103

References
  1. Alkhamisi, MA (2010). Simulation study of new estimators combining the SUR ridge regression and the restricted least squares methodologies. Statistical Papers. 51, 651-672.
    CrossRef
  2. Baksalary, JK (1991). Covariance adjustment in biased estimation. Computational Statistics & Data Analysis. 12, 221-230.
    CrossRef
  3. Fang, KT, and Zhang, YT (1990). Generalized Multivariate Analysis. Berlin and Beijing: Springer-Verlag and Science Press
  4. Gupta, AK, and Kabe, DG (1998). A note on a result for two SUR models. Statistical Papers. 39, 417-421.
    CrossRef
  5. Lee, JC, and Geisser, S (1972). Growth curve prediction. Sankhya A. 34, 393-412.
  6. Liu, AY (2002). Efficient estimation of two seemingly unrelated regression equations. Journal of Multivariate Analysis. 82, 445-456.
    CrossRef
  7. Liu, JS (2000). MSEM dominance of estimators in two seemingly unrelated regressions. Journal of Statistical Planning and Inference. 88, 255-266.
    CrossRef
  8. Maddala, GS (1977). Econometrics. New York: McGraw-Hill
  9. Muirhead, RJ (1982). Aspects of Multivariate Statistical Theory. New York: Wiley and Sons
    CrossRef
  10. Pan, JX, and Fang, KT (2007). Growth Curve Models and Statistical Diagnostics. Beijing: Science Press
  11. Percy, DF (1992). Prediction for seemingly unrelated regressions. Journal of the Royal Statistical Society Series B (Methodological). 54, 243-252.
    CrossRef
  12. Rao, CR (1967). Least square theory using an estimated dispersion matrix and its application to measurement of signal. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, LeCam, LM, and Neyman, J, ed, pp. 355-372
  13. Revankar, NS (1974). Some finite sample results in the context of two seemingly unrelated regression equations. Journal of the American Statistical Association. 69, 187-190.
    CrossRef
  14. Schmidt, P (1977). Estimation of seemingly unrelated regressions with unequal numbers of observations. Journal of Econometrics. 5, 365-377.
    CrossRef
  15. Shukur, G, and Zeebari, Z (2012). Median regression for SUR models with the same explanatory variables in each equation. Journal of Applied Statistics. 39, 1765-1779.
    CrossRef
  16. Srivastava, VK, and Giles, DEA (1987). Seemingly Unrelated Regression Equations Models. New York: Marcel Dekker
  17. Velu, R, and Richards, J (2008). Seemingly unrelated reduced-rank regression model. Journal of Statistical Planning and Inference. 138, 2837-2846.
    CrossRef
  18. Wang, SG (1989). A new estimate of regression coefficients in seemingly unrelated regression system. Science in China Series A. 32, 808-816.
  19. Zellner, A (1962). An efficient method of estimating seemingly unrelated regressions and tests for aggregation bias. Journal of the American Statistical Association. 57, 348-368.
    CrossRef
  20. Zellner, A (1963). Estimators of seemingly unrelated regression equations: some exact finite sample results. Journal of the American Statistical Association. 58, 977-992.
    CrossRef
  21. Zhou, B, Xu, Q, and You, J (2011). Efficient estimation for error component seemingly unrelated nonparametric regression models. Metrika. 73, 121-138.
    CrossRef