TEXT SIZE

CrossRef (0)
Note on response dimension reduction for multivariate regression

Jae Keun Yoo1,a

aDepartment of Statistics, Ewha Womans University, Korea
Correspondence to: 1Department of Statistics, Ewha Womans University, 52, Ewhayeodae-gil, Seodaemun-gu, Seoul 03760, Korea. E-mail: peter.yoo@ewha.ac.kr
Received July 19, 2019; Revised August 14, 2019; Accepted August 30, 2019.
Abstract
Response dimension reduction in a sufficient dimension reduction (SDR) context has been widely ignored until Yoo and Cook (Computational Statistics and Data Analysis, 53, 334–343, 2008) founded theories for it and developed an estimation approach. Recent research in SDR shows that a semi-parametric approach can outperform conventional non-parametric SDR methods. Yoo (Statistics: A Journal of Theoretical and Applied Statistics, 52, 409–425, 2018) developed a semi-parametric approach for response reduction in Yoo and Cook (2008) context, and Yoo (Journal of the Korean Statistical Society, 2019) completes the semi-parametric approach by proposing an unstructured method. This paper theoretically discusses and provides insightful remarks on three versions of semi-parametric approaches that can be useful for statistical practitioners. It is also possible to avoid numerical instability by presenting the results for an orthogonal transformation of the response variables.
Keywords : conditional mean, multivariate regression, response dimension reduction, semi-parametric model, sufficient dimension reduction
1. Introduction

Multivariate regression $Y∈Rr|X∈Rp$ is popular in many science fields when analyzing repeated measures, longitudinal data and time series data, where r ≥ 2 and p ≥ 2. In regression, sufficient dimension reduction (SDR) replaces the p-original predictors with lower-dimensional linearly transformed predictors without loss of information with respect to selected aspects of Y|X. Accordingly, SDR methods are focused on the reduction of X, not Y. However, a proper dimension reduction of the responses can facilitate statistical analysis by avoiding the curse of dimensionality in multivariate regression. If the reduction of multi-dimensional response variables is needed to be done, it is natural to follow the notion of SDR, which is to replace the responses with a lower-dimensional linearly transformed one, without loss of information.

Yoo and Cook (2008) previously provide theoretical foundation and inference procedure for the response dimension reduction in SDR context. In the seminal work, two types of response dimension reduction subspaces are newly defined without losing information on E(Y|X) and proposed an approach to estimate the response reduction subspaces. The estimation approach is non-parametrical and does not assume any specific regression models.

Cook (2007) recently showed that a semi-parametric approach in SDR can outperform nonparametric methods. Following the idea, Yoo (2018) proposed two versions of semi-parametric response dimension reduction approaches, called “principal response reduction (PRR)” and “principal fitted response reduction (PFRR)” in the context of Yoo and Cook (2008). Yoo (2018) confirms that the two semi-parametric approaches have potential advantages in the response dimension reduction over Yoo and Cook (2008). Next, Yoo (2019) developed “unstructured PFRR (UPFRR)”, which do not assume the structure of the covariance matrix of the random error vectors in Yoo (2018) in the estimation. The advantage of Yoo (2019) is a possibility of equivariant or invariant full-rank transformation. Yoo (2019) also provides good guidelines to choose either PRR or PFRR. Therefore, the semi-parametric response reduction would be complete when including the unstructured fitted response reduction.

This paper provides insightful remarks on three semi-parametric approaches in order to clearly distinguish differences among the three approaches. In addition, theoretical results for the orthogonal transformation of the response variables in the response reduction are derived for the three semi-approaches that include Yoo and Cook (2008). Normally, the full-rank transformation of the response variables are incompatible to the response reduction subspaces, not like the case for the predictors. However, in the case of the orthogonal transformation, some similar results to the predictor case can hold.

The organization of the paper is as follows. Section 2 briefly introduces the short review on Yoo and Cook (2008) and the three semi-parametric response reduction approaches. The following section is devoted to two remarks on the three approaches. Section 4 is devoted to showing the results on the orthogonal transformation. Section 5 summarizes the work.

For notational convenience, we define that stands for a subspace spanned by the columns of $B∈Rp×r$ and that x is the covariance matrix for a random vector $X∈Rp$. And, all proofs are given in rid="app1-csam-26-5-519" ref-type="app">Appendix not to interrupt reading flow.

2. Review of response reduction

### 2.1. Non-parametric response reduction

For a multivariate regression of $Y∈Rr|X∈Rp$, suppose that there exists a q × r matrix L to have the smallest possible rank among all possible matrices to satisfy

$E ( Y ∣ X ) = E { P L ( Σ y ) T Y ∣ X } ,$

where $PL(∑y)=L(LT∑yL)-1LT∑y$ is the orthogonal projection operator relative to the inner product $< v 1 , v 2 > Σ y = v 1 T Σ y v 2$.

If equation (2.1) holds, then predictors X influences the components of the conditional mean E(Y|X) only through $PL(∑y)$. This directly implies that lower-dimensional linear projection onto can replace the original r-dimensional response Y without loss of information on E(Y|X). In Yoo and Cook (YC) (2008), this type of response dimension reduction is defined as linear response reduction for E(Y|X).

Next, it is assumed that there exists a k × k matrix K satisfying that

$E ( Y ∣ X ) = E { E ( Y ∣ X , K T Y ) ∣ X } = E { E ( Y ∣ K T Y ) ∣ X } ,$

where kr and K is not equal to the identity matrix.

E(Y|KTY) in the last equivalence of (2.2) is a function of KTY; therefore, E(Y|X) is equivalently expressed as E{g(KTY)|X} for some function g(·). Then, another dimension reduction of Y is accomplished, if k < r, and this response reduction is called a conditional response reduction for E(Y|X). The subspace spanned by columns of L and K are then called a ‘response dimension reduction subspace’.

Yoo and Cook (2008) show that for L in (2.1) and K in (2.2) and that the two subspaces are equal, if satisfying: A1. E(Y|KTY = a) is linear in a. Condition A1 is called the linearity condition. The condition is satisfied, if Y is elliptically distributed. According to Hall and Li (1993), the condition is expected to hold in a reasonable approximation. If condition A1 fails, Y can be transformed for normality. Under condition A1, Yoo and Cook (2008) propose $Σ y - 1 cov ( Y , X ) Σ x - 1$ to recover L and K.

### 2.2.1. Principal response reduction

A semi-parametric response reduction approach starts with the following multivariate regression with assuming E(Y) = 0 and E(X) = 0 without loss of generality:

$Y = Γ ν x + ɛ ,$

where $Γ∈Rr×d$ with ΓTΓ = Id and dr, ɛ ~ N(0, ) and cov(νx, ɛ) = 0. In addition, νx is a d-dimensional random function of X with a positive definite sample covariance and ∑X=x νx = 0. Supposing that νx = X, model (2.3) is equal to a multivariate linear regression.

One important assumption required for the response reduction is that is an invariant and reducing subspace of , which guarantee that $Σ = Γ Ω Γ T + Γ 0 Ω 0 Γ 0 T$, where $Γ0∈Rr×((r-d)$ with $Γ 0 T Γ 0 = I r - d$ and $Γ 0 T Γ = 0$, Ω = ΓT∑Γ and $Ω 0 = Γ 0 T Σ Γ 0$.

Under model (2.3), Yoo (2018) shows that $E ( Y ∣ X ) = E ( P Γ ( Σ y ) T Y ∣ X )$, where $PΓ(∑y)=Γ(ΓT∑yΓ)-1ΓT∑y$ is the orthogonal projection operator relative to the inner product $< v 1 , v 2 > Σ y = v 1 T Σ y v 2$. The original response Y can be reduced through Γ without loss of information of E(Y|X).

The primary interest is then placed onto the estimation of Γ in model (2.3). The maximum likelihood estimation approach is a natural choice, because the normal distribution of ɛ is assumed. Letting Σ̂ y be the usual moment estimator of y, Yoo (2018) shows that the maximum likelihood estimator (MLE) of Γ is a set of the eigenvectors corresponding to the first d largest eigenvalues of $Σ^y$. This dimension reduction under model (2.3) is called PRR.

### 2.2.2. Principal fitted response reduction

The PRR utilizes the marginal information on Y without incorporating X. This might be somewhat strange, because the mean function E(Y|X) is a function of X, not Y. To overcome this issue, we set $νx=ψfx$:

$Y = Γ ψ f x + ɛ ,$

where ψ is an unknown d × q matrix, and $fx∈Rq$ is a known q dimensional vector-valued function of the predictor with ∑x fx = 0. The following notations are defined for convenience.

• : the n × r data matrix for the responses

• : the n × p data matrix for the predictors

• : an q × n matrix constructed by stacking $f x T$ and

• and $Σ^res=Σ^y-Σ^fit$

Yoo (2018) uses X, X2 exp(X), their combinations and the cluster indicator of X acquired from the K-means clustering algorithm as the candidates of fx. If setting fx = X, is equal to the ordinary least squares, and hence $Σ^fit$ is the regression product sums of square.

Under model (2.4), the MLE of Γ does not have a close form. The likelihood over Γ is as follows.

$L ( Γ , Γ 0 ) = - n 2 log | Γ 0 T Σ ^ y Γ 0 | - n 2 log | Γ T Σ ^ res Γ | .$

Therefore, the MLE of Γ depend on both $Σ^y$ and $Σ^res$. Yoo (2018) recommends a sequential selection algorithm among a set of all the eigenvectors of $Σ^y$, $Σ^fit$, and $Σ^res$, following the suggestion in Cook (2007; Section 6.2). This approach to estimate Γ is called PFRR.

### 2.2.3. Unstructured principal fitted response reduction

In model (2.3), we assume that ɛ ~ N(0, > 0) and cov(νx, ɛ) = 0:

$Y = Γ ν x + ɛ .$

The difference between models (2.3) and (2.5) is the structure of along with Γ. In model (2.5), the structure that $Σ = Γ Ω Γ T + Γ 0 Ω 0 Γ 0 T$ is no longer assumed.

Yoo (2019) presents the relationship between and y for the invariant condition so that if and only if . That is, the invariant condition for is equivalent to that for y. Suppose that one estimates Γ through PRR or PFRR. The result then allows us to investigate the invariant condition of Γ through a usual moment estimator of y. Then Yoo (2019) show that $E ( Y ∣ X ) = E ( P Γ ( Σ y ) T Y ∣ X )$ for model (2.5), as long as the invariance of for y holds. So, from now on, the invariant condition of Γ will be put on for y in model (2.5).

To utilize the information of predictors in the estimation of Γ, its fitted component model is constructed as:

$Y = Γ ψ f x + ɛ .$

Let Ed and stand for the first d largest eigenvectors of a matrix Ed and a subspace spanned by the columns of Ed, respectively. Define that $B=Σ^-1/2Σ^fitΣ^-1/2$, $B res = Σ ^ res - 1 / 2 Σ ^ fit Σ ^ res - 1 / 2$, and $B y = Σ ^ y - 1 / 2 Σ ^ fit Σ ^ y - 1 / 2$. Also, define that $Λ^=(λ^1,...,λ^q)$ and $V^=(γ^1,...,γ^q)$ be the ordered eigenvalues and corresponding eigenvectors of Bres Let $K^d=diag(0,...,0,λ^d+1,...,λ^q)$. Then, under model (2.6), Yoo (2019) derives the following results:

• $S ^ ( Γ ) = Σ ^ 1 2 S ^ d ( B )$ or $Γ ^ = Σ ^ 1 2 B d$.

• $Σ ^ = Σ ^ res + Σ ^ res 1 2 V ^ K ^ d V ^ T Σ ^ res 1 2 = Σ ^ res 1 2 ( I r + V ^ K ^ d V ^ T ) Σ ^ res 1 2$.

• $L UPFRR d = - n 2 log | Σ ^ res | + n 2 ∑ i = d + 1 q log ( 1 + λ ^ i )$.

• $S ^ ( Γ ) = Σ ^ 1 2 S d ( B ) = Σ ^ res 1 2 S d ( B res ) = Σ ^ y 1 2 S d ( B y )$.

The response reduction though model (2.6) will be called “unstructured PFRR (UPFRR)”.

3. Two remarks on PRR, PFRR, and UPFRR

### 3.1. First remark on PRR and PFRR

Under PRR and PFRR, recall that $Σ = Γ Ω Γ T + Γ 0 Ω 0 Γ 0 T$, where $Γ0∈Rr×(r-d)$ with $Γ 0 T Γ 0 = I r - d$ and $Γ 0 T Γ = 0$, $Ω=ΓT∑Γ$ and $Ω 0 = Γ 0 T Σ Γ 0$. For PRR, the covariance matrix cannot be restored because Ω is not estimable. In PFRR, Ω and Ω0 can be estimated with $Γ^TΣ^resΓ^$ and $Γ ^ 0 T Σ ^ y Γ ^ 0$, respectively. Therefore, a sample version of is possibly constructed by $Γ ^ Γ ^ T Σ ^ res Γ ^ Γ ^ T + Γ ^ 0 Γ ^ 0 T Σ ^ y Γ ^ 0 Γ ^ 0 T$. It should be noted that the different sample quantities for $Σ^res$ and $Σ^y$ are used for Γ and Γ0 and it is clearly observed that this does not coincide with its population structure.

The structural dimension d of Γ is assumed to be known in the estimation of Γ for PRR and PFRR. Since it is normally unknown, it should be estimated through a hypothesis test of H0 : d = m versus H1 : d = min(q, r) for m = 0, 1, . . . , min(q, r)−1. Since both PRR and PFRR use likelihood functions, a likelihood ratio test (LRT) should be a natural choice. For PFRR, the dimension estimation by LRT can be done with $χ q ( r - m ) 2$, while it is not plausible in PRR because is not estimable.

### 3.2. Second remark on PFRR and UPFRR

Under UPFRR, Γ spans an invariant subspace of , if and only if Γ spans an invariant subspace of y. This equivalence holds for PRR and PFRR, which is summarized in the following proposition.

### Proposition 1

Assume that model (2.3) holds. Then, if and only if . According to Cook et al. (2007, Section 2.2), for a symmetric matrix A, any invariant subspace of A is also a reducing subspace. Therefore, an invariant subspace of becomes a reducing subspace. This implies that PFRR and UPFRR are the same model. The difference between the two models is if the structure of $Σ = Γ Ω Γ T + Γ 0 Ω 0 Γ 0 T$ is kept in the estimation of Γ. The benefit to keep the structure, which is PFRR, is less parameters in the model than UPFRR. Table 1 summarizes the numbers of parameters of PFRR and UPFRR along with the difference (ru)u. More parameters are a drawback of UPFRR, but it would not be a concern if the dimension of the response reduction subspace is not high. For example, suppose that r = 4 and u = 1 or = 2. Then the difference will be 3 for u = 1 and 4 for u = 2. However, the advantage of the approach is that UPFRR has a closed form of the estimator of Γ and obtains the equivariant transformation results for the responses.

4. Orthogonal transformation

The next proposition summarizes results of an orthogonal transformation of Y for response dimension reduction.

### Proposition 2

Consider an orthogonal transformation of Y such that W = OTY for an orthogonal matrix $O∈Rr×r$, and let $Γw=OTΓ$.

Assume that $E ( Y ∣ X ) = P Γ ( Σ y T E ( Y ∣ X )$. Then, the following statement holds.

• It holds that $E ( W ∣ X ) = P Γ w ( Σ w ) T E ( W ∣ X )$.

Assume multivariate regression models in (2.3), (2.4), and (2.6).

• .

• An MLE of Γw is equal to $OTΓ^$ for PRR, PFRR, and UPFRR.

Proposition 2(a) indicates that Γw is a basis matrix of a response dimension reduction subspace for a regression of W|X. Γw is re-written as O−1Γ since the inverse matrix of O is OT. This coincides with for SDR for X (Cook, 1998, Proposition 6.3). Proposition 2(a) is not guaranteed for any non-singular transformation ATY of Y because the structure of E(Y|X) is changed according to A. One good choice for O should be the eigenvectors of cov(Y). Let Ω be a set of all eigenvectors of cov(Y). Then, the covariance matrix of ΩTY becomes a diagonal matrix, not the identity matrix, so numerical instability can be avoided, if necessary. Suppose that a response dimension reduction is done for a regression of ΩTY|X, and obtain $Γ^ω$ as its estimate. Then, the basis estimate for the response reduction of Y|X is directly computed as $ΩΓ^ω$.

Proposition 2(b) implies that Γw is an invariant subspace of w, so Γw is also an invariant space for the covariance matrix of OTɛ. This directly indicates that Γw is a basis matrix of the response reduction.

According to Proposition 2(c), the basis matrix of the response reduction for the orthogonal transformation is estimated by the estimate before the transformation pre-multiplied by the orthogonal matrix, so the same result is derived as the Yoo-Cook response reduction in Proposition 2(a).

5. Discussion

SDR has been successful in high-dimensional data analysis when involving multi-dimensional responses; consequently, their dimension reduction can facilitate the data analysis and induce undiscovered scientific results. Following the notion of SDR, the response dimension reduction was founded in Yoo and Cook (2008) along with a proposed non-parametrical approach. Two semi-parametric approaches were recently developed in Yoo (2018) and showed that the latter has potential advantage in the estimation of response reduction subspace over the former. Yoo (2019) also completes the semi-parametric method by proposing an unstructured approach. In the paper, the three version of the semi-parametric approach are discussed theoretically and provide insightful remarks that are beneficial to usual statistical practitioners to employ the semi-proposed approach. The paper also presents the results on an orthogonal transformation of response variables for the seminal work of Yoo and Cook (2008) and the three semi-parametric approaches. It is shown that it is possible to avoid numerical instability in practice in the estimation of a basis of the response reduction subspace.

Acknowledgements

For Jae Keun Yoo, this work was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Korean Ministry of Education (NRF-2019R1F1A1050715).

Appendix: Proofs

### Proof of Proposition 1

(⇒) By construction of model (2.3), we have $∑y=Γcov(νx)ΓT+∑$, and so $Σ y = Γ cov ( ν x ) Γ T + Γ Ω Γ T + Γ 0 Ω 0 Γ 0 T$. So, $∑yΓ=Γcov(νx)+ΓΩ=Γ{cov(νx)+Ω}$. This directly implies that yΓ is expressed as a linear combination of Γ, so we have . (⇐) Assume that . The assumption indicates that there is a d × d matrix η* such that $∑yΓ=Γη*$. Then we have that $∑=∑y-Γcov(νx)ΓT$, so $∑Γ=∑yΓ-Γcov(νx)=Γ(η*-cov(νx))$. This indicates that ∑Γ is expressed as a linear combination of Γ, so we have . This completes the proof.

### Proof of Proposition 2

Recall that OOT = OTO = Ir and W = OTY. It is easily noted that $∑w=cov(W)OT∑yO$.

• Proof of part (a): By the assumption, we have the following equivalences. $E ( Y ∣ X ) = P Γ ( Σ y ) T E ( Y ∣ X ) = Σ y Γ ( Γ T Σ y Γ ) - 1 Γ T E ( Y ∣ X ) = O O T Σ y O O T Γ ( Γ T O O T Σ y O O T Γ ) - 1 Γ T O O T E ( Y ∣ X ) = O Σ w Γ w ( Γ w T Σ w Γ w ) - 1 Γ w T E ( O T Y ∣ X ) = O P Γ w ( Σ w ) T E ( W ∣ X ) .$

By pre-multiplying OT to both sides in the last equivalence above, we have $E ( W ∣ X ) = P Γ w ( Σ w ) T E ( W ∣ X )$. This completes the proof.

• Proof of part (b): By the assumption of PRR, PFRR, and UPFRR given in (2.3), (2.4), and (2.6), respectively, we have the following equivalences. $S ( Σ y Γ ) ⊆ S ( Γ ) ⇔ S ( O O T Σ y O O T Γ ) ⊆ S ( O O T Γ ) ⇔ O S ( Σ w Γ w ) ⊆ O S ( Γ w ) .$

By pre-multiplying OT to both sides in the last equivalence, we have and this completes the proof.

• Proof of part (c): We have that $OTY=OTΓνx+OTɛ$, equivalently $W=Γwνx+ɛw$ by letting $ɛw=OTɛ$. Under PRR and PFRR in (2.3) and (2.4), $Σ O = cov ( ɛ w ) = O T ( Γ Ω Γ T ) O + O T ( Γ 0 Ω 0 Γ 0 T ) O$. It is easily noted that $Γ w T ( O T Γ 0 ) = 0$, so OTΓ0 becomes the orthogonal complement of Γw. Letting $OTΓ=Γw,0$, $Σ O = Γ w Ω Γ w T + Γ w , 0 Ω 0 Γ w , 0 T$. Therefore, the conditions required in PRR and PFRR hold for W|X. It is also noted that $Σ^w=OTΣ^yO$, $Σ^fit,w=OTΣ^fitO$, and $Σ^res,w=OTΣ^resO$. This implies that the largest eigenvectors of $Σ^w$, $Σ^fit,w$, and $Σ^res,w$ is the same as the largest eigenvectors of $Σ^y$, $Σ^fit$, and $Σ^res$ pre-multiplied by OT, respectively.

By this relation, for PRR, we directly have that $Γ^w=OTΓ^$.

For PFRR, we have the following likelihood function: $L ( Γ w , Γ w 0 ) = - n 2 log | Γ w 0 T Σ ^ w Γ w 0 | - n 2 log | Γ w T Σ ^ res , w Γ w | = - n 2 log | ( O Γ w 0 ) T Σ ^ y ( O Γ w 0 ) | - n 2 log | ( O Γ w ) T Σ ^ res ( O Γ w ) | .$

Therefore, Γw is maximized at $Γ^$, so $OΓ^w=Γ^$. This directly implies that $Γ^w=OTΓ^$.

For UPFRR in the regression of W|X, we have $B w = Σ ^ w - 1 2 Σ ^ fit , w Σ ^ w - 1 2 = O T O Σ ^ w - 1 2 O T O Σ ^ fit , w O T O Σ w - 1 2 O T O = O T Σ ^ y - 1 2 Σ ^ fit Σ ^ y - 1 2 O$

Therefore, the largest eigenvectors of Bw is the same as those of By pre-multiplied by OT. Let $Bw,d$ be the first largest d eigenvectors of Bw. Since $Γ ^ w = Σ ^ w 1 / 2 B w , d = O T O Σ w 1 / 2 O T O B w , d = O T Σ y 1 / 2 B y , d = O T Γ ^$. This completes the proof for part (c).

TABLES

### Table 1

Parameters and their dimensions in PFRR and UPFRR

Γ ψ Total
PFRR (ru)u qu u(u + 1)/2 + (ru)(ru + 1)/2 qu + r(r + 1)/2
UPFRR (ru)u qu r(r + 1)/2 (ru)u + qu + r(r + 1)/2

PFRR = principal fitted response reduction; UPFRR = unstructured PFRR.

References
1. Cook RD (1998). Regression Graphics: Ideas for Studying Regressions Through Graphics, Wiley, New York.
2. Cook RD (2007). Fisher lecture: dimension reduction in regression, Statistical Science, 22, 1-26.
3. Cook RD, Li B, and Chiaromonte F (2007). Dimension reduction in regression without matrix inversion, Biometrika, 94, 569-584.
4. Hall P and Li KC (1993). On almost linearity of low-dimensional projections from high-dimensional data, Annals of Statistics, 21, 867-889.
5. Yoo JK (2018). Response dimension reduction: model-based approach, Statistics: A Journal of Theoretical and Applied Statistics, 52, 409-425.
6. Yoo JK (2019). Unstructured principal fitted response reduction in multivariate regression, Journal of the Korean Statistical Society, (in Press) https://doi.org/10.1016/j.jkss.2019.02.001.
7. Yoo JK and Cook RD (2008). Response dimension reduction for the conditional mean in multivariate regression, Computational Statistics and Data Analysis, 53, 334-343.