TEXT SIZE

CrossRef (0)
Application of covariance adjustment to seemingly unrelated multivariate regressions

Lichun Wang1,a, Lawrence Pettitb

aDepartment of Mathematics, Beijing Jiaotong University, China;
bSchool of Mathematical Sciences, Queen Mary University of London, UK
Correspondence to: 1Department of Mathematics, Beijing Jiaotong University, No.3 Shangyuancun Haidian District, Beijing 100044, China. E-mail: lchwang@bjtu.edu.cn
Received February 27, 2018; Revised August 24, 2018; Accepted September 21, 2018.
Abstract

Employing the covariance adjustment technique, we show that in the system of two seemingly unrelated multivariate regressions the estimator of regression coefficients can be expressed as a matrix power series, and conclude that the matrix series only has a unique simpler form. In the case that the covariance matrix of the system is unknown, we define a two-stage estimator for the regression coefficients which is shown to be unique and unbiased. Numerical simulations are also presented to illustrate its superiority over the ordinary least square estimator. Also, as an example we apply our results to the seemingly unrelated growth curve models.

Keywords : seemingly unrelated regressions, matrix power series, two-stage estimator
1. Introduction

The system of seemingly unrelated regressions (SUR) has been investigated by many authors since the pioneering works of Zellner (1962, 1963), which can be used to model subtle interactions among individual statistical relationships. For more details, the readers are referred to Revankar (1974), Schmidt (1977), Wang (1989), Percy (1992), Liu (2000), Liu (2002). Among these, the cases of orthogonal regressors (Zeller, 1963) and triangular SUR models (Revankar, 1974) and an SUR with unequal numbers of observations (Schmidt, 1977) are more impressive. Some examples in the econometrics literature (Srivastava and Giles, 1987) suggest that the SUR model is appropriate and useful for a wide range of applications. Further, Velu and Richards (2008) focuses on some applications of reduced-rank model in the context of SUR. Alkhamisi (2010) proposes two SUR type estimators based on combining the SUR ridge regression and the restricted least squares methods as well as evaluates their performances by means of some designated criteria. Zhou et al. (2011) also employs seemingly unrelated nonparametric regression models to fit the multivariate panel data. Shukur and Zeebari (2012) considers median regression for SUR models with the same explanatory variables and obtains an interesting feature of the generalized least absolute deviations method. However, this paper will show some interesting facts about the SUR system by employing the covariance adjustment technique. We start from the system of seemingly unrelated multivariate regressions (Gupta and Kabe, 1998), namely

${Y1=X1B1+E1,Y2=X2B2+E2,$

where Yi (i = 1, 2) are n × q observation variables; Xi (i = 1, 2) are n × pi matrices with full column rank; Bi (i = 1, 2) are pi × q unknown regression coefficients; E1 and E2 are random error matrices and the row variable of (E1, E2) follow a common unspecified multivariate distribution with mean zero and covariance matrix V, where V is a 2 × 2 non-diagonal partitioned matrix and given by

$V=(V1DDTV2),$

where Vi is the variance-covariance matrix of the row variable of Ei (i = 1, 2) and D denotes the covariance matrix between the row variable of E1 and the corresponding row variable of E2. The different rows of (E1, E2) are also assumed to be uncorrelated. The case of multivariate SUR is common in biological science. For instance, if the ith row of Y1 denotes the observation vector of the weight of the ith rabbit at q different time points and the ith row of Y2 denotes the observation vector of the length of the ith rabbit at the same q time points, and the observation values of different rabbits are uncorrelated, then the multivariate SUR (1.1) reasonably model the interactions among the observation vectors of weight and length of n rabbits.

If one neglects the correlation between Y1 and Y2, i.e., taking D as zero, then only by the first equation of the system (1.1), one would obtain the least square estimator (LSE) for Vec(B1) as

$Vec^(B1)=(Iq⊗(X1TX1)-1X1T)Vec(Y1),$

and correspondingly, the LSE of the coefficients matrix B1 is $B^1=(X1TX1)-1X1TY1$, where Vec(A) denotes the direct operator of matrix A, ⊗, and Iq are the Kronecker product operator and the identity matrix of q order, respectively.

However, if we denote Y = (Y1, Y2), B = (B1, B2), and E = (E1, E2), then the system (1.1) can also be represented as:

$Vec(Y)=(Iq⊗X100Iq⊗X2)Vec(B)+Vec(E).$

Hence, from (1.4), one can obtain the LSE of Vec(B), say $Vec¯(B)$, and accordingly another estimator for Vec(B1), denoted by $Vec¯(B1)$, can be proposed since $Vec¯(B)T=(Vec¯(B1)T,Vec¯(B2)T)$. We think it makes sense that $Vec¯(B1)$ and its corresponding two-stage estimator version $Vec¯(B1)2-stage$ (in case of unknown V) should outperform $Vec^(B1)$ (1.3) since they take the another equation information on B1 into account.

The covariance adjustment technique is usually employed to obtain an optimal unbiased estimator of a vector parameter θ via linearly combining an unbiased estimator of θ, say T1, and an unbiased estimator of a zero vector, say T2 (Rao, 1967; Baksalary, 1991).

Applying the covariance adjustment technique to the estimator $Vec^(B1)$, which only uses the first equation information on Vec(B1), we firstly use (IqN2)Vec(Y2) to improve $Vec^(B1)$ noting E[(IqN2)Vec(Y2)] = (IqN2)(IqX2)Vec(B2) = 0 and obtain $Vec^(B1)(1)$, secondly we again improve $Vec^(B1)(1)$ by (IqN1)Vec(Y1) due to E[(IqN1)Vec(Y1)] = 0 and get $Vec^(B1)(2)$. Repeating this process, we obtain the following estimator sequence (k ≥ 1) for Vec(B1):

$Vec^(B1)(2k-1)=Vec^(B1)(2k-2)-Cov (Vec^(B1)(2k-2),(Iq⊗N2)Vec(Y2))×[Cov ((Iq⊗N2)Vec(Y2))]-(Iq⊗N2)Vec(Y2),$$Vec^(B1)(2k)=Vec^(B1)(2k-1)-Cov (Vec^(B1)(2k-1),(Iq⊗N1)Vec(Y1))×[Cov ((Iq⊗N1)Vec(Y1))]-(Iq⊗N1)Vec(Y1),$

where $Vec^(B1)(0)=Vec^(B1),Ni=In-Xi(XiTXi)-1XiT (i=1,2)$, and A denotes any a generalized inverse matrix of A.

Note that Cov(Vec(Yi),Vec(Yi)) = ViIn (i = 1, 2) and Cov(Vec(Y1),Vec(Y2)) = DIn. By some algebra computations, we obtain that for k ≥ 1

$Vec^(B1)(2k-1)=[Iq⊗(X1TX1)-1X1T]∑i=0k-1(DV2-1DTV1-1⊗N2N1)i×[Vec(Y1)-(DV2-1⊗N2)Vec(Y2)],$$Vec^(B1)(2k)=[Iq⊗(X1TX1)-1X1T]∑i=0k(DV2-1DTV1-1⊗N2N1)i Vec(Y1)-[Iq⊗(X1TX1)-1X1T]∑i=0k-1(DV2-1DTV1-1⊗N2N1)i(DV2-1⊗N2)×Vec(Y2).$

Denote $V-1=(V11V12V21V22)$ and Q = (V11)−1V12(V22)−1V21. By (1.2) and the inverse of partitioned matrix, we have

$Q=(V11)-1V12(V22)-1V21=[V1-DV2-1DT]·[V1-DV2-1DT]-1DV2-1·(V2-DTV1-1D)·V2-1DT[V1-DV2-1DT]-1=DV2-1DT[(V1-DV2-1DT)-1-V1-1DV2-1DT(V1-DV2-1DT)-1]=DV2-1DTV1-1,$

and $(V11)-1V12=-DV2-1$. Thus we have

$Vec¯(B1)=Vec^(B1)(∞)=limk→∞Vec^(B1)(2k-1)=limk→∞Vec^(B1)(2k)=[Iq⊗(X1TX1)-1X1T]∑i=0∞(Q⊗N2N1)i×[Vec(Y1)+((V11)-1V12⊗N2)Vec(Y2)]=[Iq⊗(X1TX1)-1X1T]{Iqn-Q∑i=0∞(Q⊗P2P1)iP2N1}×{Vec(Y1)+[(V11)-1V12⊗N2] Vec(Y2)},$

where $Pi=In-Ni=Xi(XiTXi)-1XiT$ and we use the facts that (QP2P1)0 = IqIn = Iqn, (V11)−1(QT )kV11 = Qk for k ≥ 0 and $X1T(N2N1)k=-X1T(P2P1)k-1P2N1$ for k ≥ 1.

Then, we integrate the above conclusions into the following theorem, which indicates the limit of the covariance adjustment sequence and the covariance of $Vec¯(B1)$.

### Theorem 1

For the system (1.1), the limit of the covariance adjustment sequence of Vec(B1) equals to $Vec¯(B1)$, i.e.,$limk→∞ Vec^(B1)(k)=Vec¯(B1)$, and

$Cov(Vec¯(B1))=V1⊗(X1TX1)-1-[Iq⊗(X1TX1)-1X1T]·G·[Iq⊗X1(X1TX1)-1],$

where$G=∑i=0∞[QiDV2-1DT]⊗[(P1P2P1)i-(P1P2P1)i+1]$.

### Proof

The first conclusion follows from the above discussion. Denote
$Vec¯(B1)=M(Q){Vec(Y1)+[(V11)-1V12⊗N2]Vec(Y2)}$ with
$M(Q)=[Iq⊗(X1TX1)-1X1T]{Iqn-Q∑i=0∞(Q⊗P2P1)iP2N1}$, we have

$Cov (Vec¯(B1))=M(Q)[V1⊗In+(V11)-1V12V2V21(V11)-1⊗N2+DV21(V11)-1⊗N2+(V11)-1V12DT⊗N2]MT(Q)=M(Q)[V1⊗In-QV1⊗N2]MT(Q),$

where we use the following fact

$(V11)-1V12V2V21(V11)-1+DV21(V11)-1+(V11)-1V12DT=-DV2-1DT.$

Together with the expression of M(Q), we have

$Cov (Vec¯(B1))={Iq⊗(X1TX1)-1X1T-∑i=0∞[Qi+1⊗(X1TX1)-1X1T(P2P1)iP2N1]}×{V1⊗X1(X1TX1)-1-∑i=0∞[V1(QT)i+1⊗N1P2(P1P2)iX1(X1TX1)-1]}-{Iq⊗(X1TX1)-1X1T-∑i=0∞[Qi+1⊗(X1TX1)-1X1T(P2P1)iP2N1]}×{QV1⊗N2X1(X1TX1)-1-∑i=0∞[QV1(QT)i+1⊗N2N1P2(P1P2)iX1(X1TX1)-1]}=V1⊗(X1TX1)-1-[Iq⊗(X1TX1)-1X1T] [QV1⊗N2] [Iq⊗X1(X1TX1)-1]+[Iq⊗(X1TX1)-1X1T] [∑i=0∞Qi+2V1⊗(P2P1)iP2N1N2] [Iq⊗X1(X1TX1)-1]+[Iq⊗(X1TX1)-1X1T] [∑i=0∞QV1(QT)i+1⊗N2N1P2(P1P2)i] [Iq⊗X1(X1TX1)-1]+[Iq⊗(X1TX1)-1X1T] [∑i=0∞Qi+1⊗(P2P1)iP2N1∑i=0∞V1(QT)i+1⊗N1P2(P1P2)i] [Iq⊗X1(X1TX1)-1]-[Iq⊗(X1TX1)-1X1T] [∑i=0∞Qi+1⊗(P2P1)iP2N1∑i=0∞QV1(QT)i+1⊗N2N1P2(P1P2)i] [Iq⊗X1(X1TX1)-1]=V1⊗(X1TX1)-1-[Iq⊗(X1TX1)-1X1T] [QV1⊗N2] [Iq⊗X1(X1TX1)-1]+[Iq⊗(X1TX1)-1X1T] [∑i=0∞QV1(QT)i+1⊗P2(P1P2)i+1-∑i=0∞Qi+2V1⊗(P2P1)i+1]×[Iq⊗X1(X1TX1)-1].$

Using $QV1=DV2-1DT,X1TN2X1=X1T(In-P1P2P1)X1$, and $X1TP1=X1T$, we have

$Cov (Vec¯(B1))=V1⊗(X1TX1)-1-[Iq⊗(X1TX1)-1X1T] [DV2DT⊗(In-P1P2P1)] [Iq⊗X1(X1TX1)-1]-[Iq⊗(X1TX1)-1X1T]{∑i=1∞QiDV2-1DT⊗(P1P2P1)i-∑i=1∞DV2-1DT(QT)i⊗(P1P2P1)i+1}×[Iq⊗X1(X1TX1)-1]=V1⊗(X1TX1)-1-[Iq⊗(X1TX1)-1X1T]{∑i=0∞QiDV2-1DT⊗(P1P2P1)i∑i=0∞DV2-1DT(QT)i⊗(P1P2P1)i+1}×[Iq⊗X1(X1TX1)-1]=V1⊗(X1TX1)-1-[Iq⊗(X1TX1)-1X1T]{∑i=0∞[QiDV2-1DT]⊗[(P1P2P1)i-(P1P2P1)i+1]}×[Iq⊗X1(X1TX1)-1],$

where the last step uses the facts that (P1P2P1)0 = In and $QiDV2-1DT=DV2-1DT(QT)i$ for i ≥ 0.

The proof of Theorem 1 is finished.

Note that $Q0DV2-1DT=DV2-1DT≥0$, InP1P2P1 ≥ 0 and for i ≥ 1

$QiDV2-1DT=DV2-1DT(QT)i={DV2-1DT(V1-1DV2-1DT)k-1V1-1(DV2-1DTV1-1)k-1DV2-1DT≥0,i=2k-1,DV2-1DT(V1-1DV2-1DT)k-1V1-1DV2-1DTV1-1(DV2-1DTV1-1)k-1DV2-1DT≥0,i=2k,k=1,2,…,$

and (P1P2P1)i – (P1P2P1)i+1 ≥ 0. Hence

$G=∑i=0∞[QiDV2-1DT]⊗[(P1P2P1)i-(P1P2P1)i+1]≥0.$

Further, since $Cov(Vec^(B1))=V1⊗(X1TX1)-1$, we have

$Cov (Vec¯(B1))=Cov (Vec^(B1))-[Iq⊗(X1TX1)-1X1T]·G·[Iq⊗X1(X1TX1)-1]≤Cov (Vec^(B1)),$

which means $Vec¯(B1)$ is superior to $Vec^(B1)$ in the sense of having less covariance. This result is exactly consistent with the fact that $Vec^(B1)$ only uses the first regression information on Vec(B1), whereas $Vec¯(B1)$ combines the second regression equation with the first one via covariance adjustment.

2. The characteristics of matrix series

Note that for i = 1, 2, . . . ,

$X1T(P2P1)i-1P2N1=0⇔X1T(P2P1)i-1P2N1N2=0.$

We only need to prove that the right equality implies the left equality. Note that $X1T(P2P1)i-1P2N1N2=0$ concludes $X1T(P2P1)i-1N2N1N2=0$, hence one has $X1T(P2P1)i-1N2N1N2(P1P2)i-1X1=0$, thus $X1T(P2P1)i-1N2N1=0$, where we use $N12=N1$. Further, replace N2 by InP2 and note that $X1TN1=0$ and P1N1 = 0, we have $X1T(P2P1)i-1P2N1=0$.

Therefore, (2.1) implies that for i = 1, 2, . . . ,

$(X1TX1)-1X1T(P2P1)i-1P2N1=0⇔(X1TX1)-1X1T(P2P1)i-1P2N1N2=0,$

which further shows that for i = 1, 2, . . . ,

$Qi⊗(X1TX1)-1X1T(P2P1)i-1P2N1=0⇔Qi(V11)-1V12⊗(X1TX1)-1X1T(P2P1)i-1P2N1N2=0,$

where we note that $Q=DV2-1DTV1-1$ and $Qi(V11)-1V12=-QiDV2-1$ and D is the covariance matrix of E1 and E2, and that both Q and Qi(V11)−1V12 are invertible.

Set

$Vec¯(B1)s=[Iq⊗(X1TX1)-1X1T] Vec(Y1)+[(V11)-1V12⊗(X1TX1)-1X1TN2] Vec(Y2).$

The following theorem shows that the matrix series (1.10) only have one degeneration form $Vec¯(B1)s$.

### Theorem 2

$Vec¯(B1)s$is the unique simpler form of$Vec^(B1)(∞)$.

Proof

Note that for any a fixed i (i ≥ 1) that: if $X1T(P2P1)i-1P2N1=0$, then $X1T(P2P1)iP2N1=X1T(P2P1)i-1P2(In-N1)P2N1=0$. Step by step, we come to

$X1T(P2P1)k-1P2N1=0, k=i+1,i+2,....$

Thus, we find

$Qi⊗(X1TX1)-1X1T(P2P1)i-1P2N1=0, for any a fixed i (i≥1)⇒Qk⊗(X1TX1)-1X1T(P2P1)k-1P2N1=0, k=i+1,i+2,....$

On the other hand, if for any a fixed i (i ≥ 2) one has $X1T(P2P1)i-1P2N1=0$, then it is easy to see that

$X1T(P2P1)i-1(In-N2)N1=0⇒X1T(P2P1)i-1N2N1=0⇒X1T(P2P1)i-2P2(In-N1)N2N1=0⇒X1T(P2P1)i-2P2N1N2N1=0⇒X1T(P2P1)i-2P2N1N2N1P2(P1P2)i-2X1=0⇒X1T(P2P1)i-2P2N1N2=0⇒X1T(P2P1)i-2P2N1=0,$

where the last step comes from the fact (2.1). Thus, step by step we conclude that

$Qi⊗(X1TX1)-1X1T(P2P1)i-1P2N1=0, for any a fixed i (i≥2)⇒Qk⊗(X1TX1)-1X1T(P2P1)k-1P2N1=0, k=1,2,…,i-1.$

Combining (2.6) with (2.8), we know that for any a fixed i (i ≥ 1) if

$Qi⊗(X1TX1)-1X1T(P2P1)i-1P2N1=0,$

then the infinite series

$∑i=1∞[Qi⊗(X1TX1)-1X1T(P2P1)i-1P2N1]=0,$

and by (2.3), concurrently we conclude that the infinite series

$∑i=1∞[Qi(V11)-1V12⊗(X1TX1)-1X1T(P2P1)i-1P2N1N2]=0.$

Hence, $Vec^(B1)(∞)$ has unique simpler form $Vec¯(B1)s$ in the sense that if one term in (2.10) or (2.11) is zero, then both infinite sums turn into zero.

The proof of Theorem 2 is finished.

3. The properties of two-stage estimator

If the covariance matrix V is unknown, then both $Vec^(B1)(∞)$ and the simpler form $Vec¯(B1)s$ are not available to use. Set = (X1, X2), we estimate V by

$V^=1n-R(X˜)(Y1TY2T)(In-PX˜) (Y1,Y2),$

where R( ) is the rank of and P = (T )T .

Following from E(aT Ab) = trace[ACov(b, a)] + (Ea)T A(Eb) and (InP )Xi = 0(i = 1, 2), where a and b denote two random vectors, we have $E[YiT(In-PX˜)Yi]=Vi[n-R(X˜)] (i=1,2)$ and $E[Y1T(In-PX˜)Y2]=D[n-R(X˜)]$, which show that

$EV^=(V1DDTV2)=V.$

Substituting the estimator for V in the expressions of $Vec^(B1)(∞)$ and $Vec¯(B1)s$ , we obtain the following two two-stage estimators

$Vec^(B1)2-stage(∞)=M(Q^){Vec(Y1)+[-D^V^2-1⊗N2] Vec(Y2)}$

with $M(Q^)=[Iq⊗(X1TX1)-1X1T]{Iqn-Q^∑i=0∞(Q^⊗P2P1)iP2N1}$ and $Q^=D^V^2-1D^TV^1-1$, and

$Vec¯(B1)s,2-stage=[Iq⊗(X1TX1)-1X1T] Vec(Y1)+[-D^V^2-1⊗(X1TX1)-1X1TN2] Vec(Y2).$

Similar to Theorem 2, we know that $Vec¯(B1)s,2-stage$ is the unique simpler form of $Vec^(B1)2-stage(∞)$. Hence, we focus on the performances of $Vec¯(B1)s,2-stage$.

The matrix-variate normal distribution is a commonly used distribution in the class of matrix elliptically symmetric distributions. It plays an important role in the investigation of multivariate regression models such as the growth curve model (GCM). In what follows, in order to establish the unbiasedness of $Vec¯(B1)s,2-stage$, we first briefly present the definition of the matrix-variate normal distribution as well as two related properties and then make some assumptions on the distributions of random error matrices Ei (i = 1, 2).

### Definition 1

A random matrix Z with order n×q is said to follow a matrix-variate normal distribution if its probability function is of the form

$f(Z)=(2π)-nq2[det(Σ)]-q2[det(Ω)]-n2 exp (-12trace{Ω-1[Z-M]TΣ-1[Z-M]}),$

where M, ∑ > 0, and Ω > 0 are n × q, n × n, and q × q matrices, respectively, and det(A) is the determinant of the square matrix A. In this case, it is usually denoted that Z ~ Nn,q(M, ∑,Ω).

The following two lemmas point out that the relationship between the matrix-variate and vectorvariate normal distributions and an affine transformation of a matrix-variate normal variable also follows a matrix-variate normal distribution. The readers are referred to the first chapter of Pan and Fang (2007) for more details.

Lemma 1

Let Z be a n × q random matrix and z = Vec(Z). Then Z ~ Nn,q(M, ∑,Ω) if z ~ Nnq (Vec(M),Ω ⊗ ∑).

Lemma 2

Suppose Z ~ Nn,q(M, ∑,Ω), and that C, A1 > 0, and A2 > 0 are given matrices with orders n × q, n × n, and q × q, respectively. Then$A1ZA2+C~Nn,q(A1MA2+C,A1ΣA1T,A2ΩA2T)$.

In the following, we assume that in the system (1.1) the random error matrices Ei (i = 1, 2) follow the matrix-variate normal distribution Nn,q(0, In, Vi), which indicate that the rows of Ei are iid random vectors with common distribution Nq(0, Vi) (i = 1, 2), respectively. Thus, the rows of E = (E1, E2) are iid random vectors with common distribution N2q(0, V), i.e., E ~ Nn,2q(0, In, V). Hence, by Lemmas 1 and 2 we know that

$Vec(Y)=Vec(Y1,Y2)~N2nq([Iq⊗X100Iq⊗X2] Vec(B),V⊗In).$

Denote $Yi=(y1(i),y2(i),…,yq(i)) (i=1,2)$. Then the matrix = [nR()]−1(i j)q×q with the element

$d^ij=(yi(1))T [In-PX˜] yj(2)=(Vec(Y))T [Oi,q+j(2q×2q)⊗(In-PX˜)] Vec(Y),$

where the matrix Oi,q+j(2q × 2q) with order 2q × 2q consists of all zeros only except the element in the ith row and the (q + j)th column is one. Similarly, the (i, j)th element of 2 is equal to

$(Vec(Y))T[Oq+i,q+j(2q×2q)⊗(In-PX˜)] Vec(Y),$

where the 2q × 2q order matrix Oq+i,q+j(2q × 2q) consists of all zeros only; except the element in the (q + i)th row and the (q + j)th column is one.

Note that $[Iq⊗(X1TX1)-1X1TN2]Vec(Y2)=[0qp1×nq,Iq⊗(X1TX1)-1X1TN2]Vec(Y)$. Hence, using $X1TN2[In-PX˜]=0$ and following from the discriminant condition of independence of the linear function and quadratic function of normal variables and the following easily verified facts:

$[0qp1×nq,Iq⊗(X1TX1)-1X1TN2] [V⊗In] [Oi,q+j(2q×2q)⊗(In-PX˜)]=0,$

and

$[0qp1×nq,Iq⊗(X1TX1)-1X1TN2] [V⊗In] [Oq+i,q+j(2q×2q)⊗(In-PX˜)]=0.$

We know that

$E[Vec¯(B1)s,2-stage]=[Iq⊗(X1TX1)-1X1T](Iq⊗X1)Vec(B1)+[(E(-D^V^2-1))⊗(X1TX1)-1X1TN2](Iq⊗X2)Vec(B2)=Vec(B1).$

Thus, we obtain the following theorem, which states the unbiasedness of the two-stage estimator.

### Theorem 3

Under the assumptions that Ei ~ Nn,q(0, In, Vi) (i = 1, 2), the two-stage estimator$Vec¯(B1)s,2-stage$is unbiased, i.e.,$E[Vec¯(B1)s,2-stage]=Vec(B1)$.

In the following, we refer to Grunfeld’s data in Maddala (1977) and present two simulation studies to compare the performances of $Vec¯(B1)s,2-stage$ with those of $Vec^(B1)$ under the conditions that there are some known relationships between the design matrices X1 and X2 and no relationships between X1 and X2, respectively.

(I) The case that X1 = (X2, L)

Where the system (1.1) is of the form Yi = XiBi + Ei (i = 1, 2) with E = (E1, E2) ~ Nn,4(0, In, V), and

$B1=(111213), B2=(16-32), V=(10ρ0010ρρ0100ρ01).$

Set S (B1) = (Y1X1B1)T (Y1X1B1). Note that the estimator $B^1=(X1TX1)-1X1TY1$ given by (1.3), which corresponds to the LSE $Vec^(B1)$, actually makes the residual sum of squares (in the sense of nonnegative definite), trace of S (B1), determinant of S (B1) and the largest eigenvalue of S (B1) achieve their minimums (Muirhead, 1982). Therefore, under the four different criteria of measurement, if only the first equation Y1 = X1B1 + E1 is used then the LSE of the regression coefficient B1 are completely identical (Fang and Zhang, 1990). Thus, without loss of generality, we illustrate the superiorities of $Vec¯(B1)s,2-stage$ by comparing trace(S (1)) with trace(S (1,s,2-stage)), where
$B¯1,s,2-stage=(X1TX1)-1X1TY1+(X1TX1)-1X1TN2Y2(-D^V^2-1)T$, which corresponds to $Vec¯(B1)s,2-stage$. We also present the values of trace(S (1,s)) for contrast, where $B¯1,s=(X1TX1)-1X1TY1+(X1TX1)-1X1TN2Y2(-DV2-1)T$ corresponds to (2.4).

In Table 1, based on different combinations of the correlation ρ and sample size, we present some numerical demonstrations to compare trace(S (1,s,2-stage)) with trace(S (1)) and trace(S (1,s)), which exhibit the performances of the simplified two-stage estimator $Vec¯(B1)s,2-stage$ when the sample size is relatively small and moderate. Consequently, we find that the performance of the two-stage estimator tends to improve as the sample size increases. However, it also depends on the correlation ρ, and especially when n ≥ 20 and ρ ≥ 0.5, we easily see that |trace(S (1,s,2-stage)) – trace(S (1,s))| < |trace(S (1)) – trace(S (1,s))|, which shows that the two-stage estimator $Vec¯(B1)s,2-stage$ is closer to $Vec¯(B1)s$.

(II) The case that there are no relationships between X1 and X2

In this case we assume that the system (1.1) has the same form as (3.10) but there are no relationships between X1 and X2. The simulations are presented below. From Table 2, we see that trace(S (1,s,2-stage)) is getting closer to trace(S (1,s)), which implies that the two-stage estimator $Vec¯(B1)s,2-stage$ is becoming better than the LSE $Vec^(B1)$ as the sample size goes large (n ≥ 20 or larger), also the fact depends on the value of the correlation ρ (≥ 0.5). This is because that from the viewpoint of covariance adjustment the one-step covariance adjustment estimator $Vec^(B1)(1)$, which is exactly equal to $Vec¯(B1)s$, is superior to $Vec^(B1)$ in the sense of having less covariance even though there are no relationships between X1 and X2. Hence, the simulation study discloses the tendency of $Vec¯(B1)s,2-stage$ performing better, which is consistent with a two-stage estimator that incorporates more information.

4. An illustrating example

The GCM is a generalized multivariate analysis-of-variance model, which is useful especially for investigating growth problems on short time series in economics, biology and medical research (see Lee and Geisser 1972, Pan and Fang 2007). The seemingly unrelated GCMs are defined as

${Y1=X1B1Z1+E1,Y2=X2B2Z2+E2,$

where Yi are n × q observation matrices, Xi and Zi are known design matrices of full column rank and full row rank, respectively, and the regression parameters B1 and B2 are unknown. The assumptions on E1 and E2 are the same as those in the system (1.1).

Therefore, without considering the interactions between the two equations, we obtain the LSE of B1 from the first equation as

$B^1=(X1TX1)-1X1TY1V1-1Z1T(Z1V1-1Z1T)-1,$

which is unbiased and the corresponding covariance $Cov(B^1)=Cov(Vec(B^1))=(Z1V1-1Z1T)-1⊗(X1TX1)-1$. However, combining the information of the second equation and the assumption $X1TX2=0$, we obtain the system LSE for B1 as

$B¯1=(X1TX1)-1X1T (Y1V11+Y2V21)Z1T(Z1V11Z1T)-1,$

which is unbiased and with less covariance

$Cov (B¯1)=Cov (Vec (B¯1))=(Z1V11Z1T)-1⊗(X1TX1)-1,$

which is less than Cov(1) since
$V1-1≤(V1-DV2-1DT)-1=V11$ and correspondingly $(Z1V1-1Z1T)-1≥(Z1V11Z1T)-1$.

In the case that the covariance matrix V is unknown, under the assumption that E = (E1, E2) ~ Nn,2q(0, In, V), we use the same form estimator as that of the equation (3.1) to estimate V, which is easily shown to be unbiased. Hence, a two-stage estimator for B1 is defined as

$B¯1,2-stage=(X1TX1)-1X1T (Y1V^11+Y2V^21)Z1T (Z1V^11Z1T)-1,$

where $V^11=(V^1-D^V^2-1D^T)-1$ and $V^21=-V^2-1D^TV^11$. Analogous to the previous discussions, we can establish the unbiasedness of the estimator 1,2-stage.

In the following, we illustrate a simulation study to compare the performances of 1,2-stage with those of 1 under the matrix 2-norm criterion, where the 2-norm of a matrix A is given by $‖A‖2=‖Vec(A)‖2=(∑i∑jaij2)1/2$. The performances of 1 are also presented as a contrast. In each simulation, a sample of size n observations is randomly generated from a 2q-variate normal distribution with mean zero and covariance matrix V, which is considered as the error matrix En×2q = (E1, E2). Next, 1, 1,2-stage, and 1 are calculated in each simulation. Simulations are repeated 500 times and the matrix 2-norms of the average values of 1B1, 1,2-stageB1, and 1B1 are given in

Three cases are studied. The first of them corresponds to n = 10, the second one considers the case of n = 20 and the third one corresponds to the case of n = 50. All cases adopt the same V as (3.10), but with the correlation ρ having a number of alternative values.

Simulations for the case (i) with

$X1T =(616-30219231925261711614-3022715202662425837-28125-4-2914),X2T =(14256352582634216853),B1 =(111213), B2=(16-32), Z1=(1234), Z2=(1560.5).$

Simulations for the case (ii) with $X1T$ being

$(-94815-26545253510304502051570657560906020-106930-34071580553585704025957551020906545100-176660-27131001070755065904045525855535308015)$

and

$X2T=(123456789101112131415161718192024168712484185394731162),$

where B1, B2, Z1, and Z2 are the same as the case (i).

Simulations for the case (iii) with X1 = [a1, a2, a3]50×3 being randomly generated and X2 = [a4, a5]50×2 being obtained from the null space of $X1T$, and in this case B1, B2, Z1, and Z2 remain the same as those of the case (i).

From Table 3, except the situations that ρ = 0.2 and ρ = 0.5 with n = 10, we find that norm(1,2-stageB1) is uniformly smaller than norm(1B1), which shows that the two-stage estimator 1,2-stage is closer to the true value B1 than the LSE 1.

5. Concluding remarks

In summary, we have investigated regression coefficients estimation and inference for the system of two multivariate SURs. Note that we focus on the estimation problem of B1 since the positions of B1 and B2 are equipotent. In Section 1, we find that together with another equation information the estimator of regression coefficients can be presented as a matrix power series via the method of covariance adjustment. In Section 2, we further indicate that the matrix series has exactly one simpler form which is just the one-step covariance adjustment estimator of the regression coefficients. In Section 3, in the case that the covariance matrix of the system is unknown, we illustrate that the degeneration form of the two-stage estimator sequence is unique, and an unbiased two-stage estimator is proposed and numerical simulations are also presented to verify its superiority. The results established in the present paper enrich the existing results since they include Zellner’s univariate SURs as a special case.

TABLES

### Table 1

Comparisons between the two-stage estimator and the least square estimator

ρ n trace(S(1)) trace(S(1,s,2-stage)) trace(S(1,s))
0.2 10 12.7118 12.8775 12.7718
20 30.0456 30.2155 30.0873
50 141.1215 141.9674 141.3323

0.5 10 15.3561 15.4368 15.5200
20 35.3589 35.3747 35.3775
50 108.2523 108.2895 108.2894

0.7 10 7.7018 8.0020 7.9504
20 42.4375 45.8153 44.6528
50 103.9155 104.3501 104.2045

0.9 10 18.6055 18.6758 18.6689
20 35.6630 37.5563 37.2077
50 119.9993 122.4540 122.2700

### Table 2

Comparisons between the two-stage estimator and the least square estimator

ρ n trace(S(1)) trace(S(1,s,2-stage)) trace(S(1,s))
0.2 10 16.4822 16.7363 16.5106
20 45.1567 45.3013 45.1656
50 90.6848 90.7475 91.0296

0.5 10 11.7975 12.4334 12.0509
20 30.6931 31.4717 31.2451
50 91.1112 91.1782 91.2101

0.7 10 15.0877 15.4275 15.2887
20 34.0316 34.9262 34.6478
50 78.8424 79.1979 79.4065

0.9 10 7.7416 8.8334 9.0277
20 37.1639 39.6293 38.6530
50 106.1452 106.7684 106.7741

### Table 3

Comparisons between several estimators under the matrix 2 norm

ρ n norm(1B1) norm(1,2-stageB1) norm(1B1)
0.2 10 0.1931 0.2668 0.1879
20 0.0183 0.0193 0.0182
50 0.2259 0.2266 0.2205

0.5 10 0.2013 0.2283 0.1679
20 0.0186 0.0177 0.0171
50 0.2557 0.2271 0.2198

0.7 10 0.1896 0.1872 0.1383
20 0.0184 0.0134 0.0127
50 0.2532 0.1820 0.1763

0.9 10 0.1923 0.1138 0.0847
20 0.0173 0.0085 0.0079
50 0.2471 0.1133 0.1103

References
1. Alkhamisi, MA (2010). Simulation study of new estimators combining the SUR ridge regression and the restricted least squares methodologies. Statistical Papers. 51, 651-672.
2. Baksalary, JK (1991). Covariance adjustment in biased estimation. Computational Statistics & Data Analysis. 12, 221-230.
3. Fang, KT, and Zhang, YT (1990). Generalized Multivariate Analysis. Berlin and Beijing: Springer-Verlag and Science Press
4. Gupta, AK, and Kabe, DG (1998). A note on a result for two SUR models. Statistical Papers. 39, 417-421.
5. Lee, JC, and Geisser, S (1972). Growth curve prediction. Sankhya A. 34, 393-412.
6. Liu, AY (2002). Efficient estimation of two seemingly unrelated regression equations. Journal of Multivariate Analysis. 82, 445-456.
7. Liu, JS (2000). MSEM dominance of estimators in two seemingly unrelated regressions. Journal of Statistical Planning and Inference. 88, 255-266.
8. Maddala, GS (1977). Econometrics. New York: McGraw-Hill
9. Muirhead, RJ (1982). Aspects of Multivariate Statistical Theory. New York: Wiley and Sons
10. Pan, JX, and Fang, KT (2007). Growth Curve Models and Statistical Diagnostics. Beijing: Science Press
11. Percy, DF (1992). Prediction for seemingly unrelated regressions. Journal of the Royal Statistical Society Series B (Methodological). 54, 243-252.
12. Rao, CR (1967). Least square theory using an estimated dispersion matrix and its application to measurement of signal. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, LeCam, LM, and Neyman, J, ed, pp. 355-372
13. Revankar, NS (1974). Some finite sample results in the context of two seemingly unrelated regression equations. Journal of the American Statistical Association. 69, 187-190.
14. Schmidt, P (1977). Estimation of seemingly unrelated regressions with unequal numbers of observations. Journal of Econometrics. 5, 365-377.
15. Shukur, G, and Zeebari, Z (2012). Median regression for SUR models with the same explanatory variables in each equation. Journal of Applied Statistics. 39, 1765-1779.
16. Srivastava, VK, and Giles, DEA (1987). Seemingly Unrelated Regression Equations Models. New York: Marcel Dekker
17. Velu, R, and Richards, J (2008). Seemingly unrelated reduced-rank regression model. Journal of Statistical Planning and Inference. 138, 2837-2846.
18. Wang, SG (1989). A new estimate of regression coefficients in seemingly unrelated regression system. Science in China Series A. 32, 808-816.
19. Zellner, A (1962). An efficient method of estimating seemingly unrelated regressions and tests for aggregation bias. Journal of the American Statistical Association. 57, 348-368.
20. Zellner, A (1963). Estimators of seemingly unrelated regression equations: some exact finite sample results. Journal of the American Statistical Association. 58, 977-992.
21. Zhou, B, Xu, Q, and You, J (2011). Efficient estimation for error component seemingly unrelated nonparametric regression models. Metrika. 73, 121-138.