As the so-called big data era is coming, high-dimensional data analysis is inevitable, and the meaningful reduction of data is essential to data analysis.
In regression of
where stands for statistical independence.
Such
For many SDR methodologies, sliced inverse regression (SIR; Li, 1991), sliced average variance estimation (SAVE; Cook and Weisberg, 1991) and directional regression (DR; Li and Wang, 2007) are popular among the others. The SIR considers the first inverse moments
Both SIR and SAVE has been extensively extended to cover many types of regression such as multivariate regression and large
The organization of the paper is as follows. Direction regression is reviewed in Section 2. Section 3 is devoted to extending DR to multivariate regression, and its extension to large
DR (Li and Wang, 2007) generalizes the ideas of inverse regression by considering conditional moments of
to represent each empirical direction. Contour regression (CR; Li
which can be explained as regressing
as a candidate matrix. Here, similar to CR, the candidate matrix of DR involves
where both of them require
This alternative version of the candidate matrix implies DR is a combination of SIR and SAVE as
then, DR can exhaustively estimate a central subspace and this can be written as
The following is the algorithm to calculate the sample version of the candidate matrix of DR.
Step 1. Standardize (
Step 2. Let
Step 3. Let
Step 4. Finally, compute,
Directional regression (DR) requires the slicing procedure for implementing it in practice. Therefore, even with multi-dimensional response regression, the theoretical foundation of DR is not changed. Rather, the slicing scheme must be changed in its implementation to afford the multivariate responses.
One simple solution to this is hierarchical-slicing. For example, consider four-dimensional responses
In sliced-based approach, it should be noted that the slicing is required for easy computation of the inverse moments. Therefore, the slicing is done under the philosophy that similar values are in slices.
A good option to follow this philosophy for multivariate responses is clustering. The first attempt to cluster the responses for inverse regression is Stedoji and Cook (2004), in which
Although double-slicing or clustering methods are effective and efficient for multivariate responses, it is still problematic that some slices have small sample sizes.
One way to overcome this problem is to pool DR’s application results from each coordinate regression. This approach initially proposed in Yoo
where is the central subspace of
It directly indicates that pooling the central subspace of
One deficit of this pooling approach is that the exhaustiveness of DR for the central subspace of
Let
Step 1. Compute
Step 2. Construct
Step 3. Spectral-decompose
Step 4. Let Γ̂
This approach to estimate is called
The so-called large
In such case, employing seeded dimension reduction approach to DR would be a possible and realistic route. All quantities in the population DR kernel matrices are computed as
However, this is not recommended to do, because the dimension of the seed matrix is too high. Actually, the seed matrix is a
The extension of DR to large
Lastly, one can develop regularized DR to embracing L-1 or L-2 penalty, and this is definitely a new load and leave it as an open work.
For all numerical studies, the sample sizes were 100 with
For all simulation models, is spanned by the two columns of
To measure how the dimension reduction methods under consideration for multivariate regression estimate well, we computed the trace correlation distance
where
All predictors
To compare the estimation performances among hierarchical-slicing, hierarchical-clustering and pooled DR, the following four bivariate regressions were constructed: M2-1:
Next, to investigate how well the hierarchical-slicing, hierarchical-clustering and pooled DR and their compatible versions of SIR and SAVE estimate , the following four dimensional-response regressions were considered : M4-1:
Numerical studies for M2-1–M2-4 and M4-1–M4-3 are summarized by boxplots of
According to Figure 1, three versions of DR for multivariate regression show quite similar performances, although bivariate slicing would be recommended.
In case of four-dimensional regression, the following common aspects are observed. The DR and SIR are robust to hierarchical clustering or pooled approach. The SAVE is very sensitive to the choice of the approach, and the pooled approach is much better than hierarchical-clustering one. Also, with larger
Sufficient dimension reduction methodologies for multivariate regression and large
In this paper, we discuss how DR is extended to embracing multivariate regression and large
Extension of DR to large
Numerical studies show that multivariate versions of DR do not outperform the same versions of SIR and SAVE. This would be disappointing, but it is confirmed that DR is robust to the number of clusters and the choice of hierarchical-clustering or pooled DR.
The regularized version of DR is left as an open work.