The goal of sufficient dimension reduction (SDR) is to replace original
Consider a regression of
where
Statement (
The sliced inverse regression (SIR) (Li, 1991) is one of the most popular SDR methods due to large applicability and simple implementation in practice. The key step of the SIR application is a categorization of a response variable
The inference of requires two steps. First, the structural dimension
This paper investigates robustness in the dimension estimation of in FSIR by employing a permutation approach. If FSIR is sensitive to determine the structural dimension, then this accordingly affects the robustness in the basis estimation. This study can completely prove the potential advantages of FSIR over SIR in robust and possibly better estimation of .
The organization of the paper is as follows. In Section 2, we briefly discuss a FSIR and suggest a permutation dimension determination. Section 3 is devoted to numerical studies and the presentation of a real data application. Our work is summarized in Section 4. We will define the following notations, which will be used frequently throughout the rest of the paper. A subspace stands for a subspace spanned by the columns of
The method of SIR (Li, 1991) will be explained through the normalized predictor
To guarantee the exhaustive estimation of , it is usually assumed that . An estimation of through
Since any specific models of
The sample SIR algorithm is as:
Construct
Construct
Compute the sample version
Spectral-decompose
The structural dimension
The eigenvectors
to have the sample basis estimate for
In the sample implementation of SIR, the construction of
To overcome the deficit of the sensitiveness of SIR to the numbers of slices, a fused approach is developed by Cook and Zhang (2014) to combine the
where
Since
Furthermore, the assumed equivalence of
This directly implies that the columns of
The sample version
The inference procedure of via
Cook and Zhang (2014) provide that FSIR is robust in the basis estimation of to the number of slices compared with SIR, and they estimate
For this, in the next section, a permutation approach is suggested to estimate the structural dimension
It is necessary to first define related hypothesis before estimating the true structural dimension. For this, the following sequence of hypothesis (Rao, 1965) is tested. Starting with
where
To avoid the difficulty in deriving the asymptotics of Λ̂
where
The permutation dimension determination algorithm is:
Construct
Compute two sets predictors of
Randomly permute the index
Compute the test statistic
Repeat steps (3)–(4)
In this permutation determination procedure, it had better be noted that the
We considered the following two regression models:
Model 1:
Model 2:
For both models, the predictors of
For Model 1, the structural dimension of is equal to one, and is spanned by
Tho sample sizes of
As a summary of the numerical studies, the percentages of the correct dimension determination for each model were initially computed. Furthermore, to investigate the robustness in the dimension estimation, for each iteration of the model, the concordant decisions with its true dimension were investigated with two or more slice choices. Let
2 pairs | |||
3 pairs | |||
All |
For Model 1 and 2, the dimension
Figure 1 shows characteristic behaviors in the dimension estimation. In most numerical studies, FSIR shows better dimension determination results than SIR. With smaller sample size, FSIR is more robust to the numbers of slices than SIR. With complex regression models, FSIR is impacted by the numbers of slices, which is less affected than SIR. FSIR shows more consistent behaviors in the dimension estimation to the distribution of the variables than SIR. As discussed in Cook and Zhang (2014), FSIR yields more robust and better estimation result to the number of slices than SIR. This directly impacts the dimension estimation, so FSIR yields more robust results than SIR. This numerical studies confirm that FSIR should have a potential advantage over SIR in both dimension and basis estimation of .
For illustration purposes, primary biliary cirrhosis (PBC) data in Tibshirani (1997) and Yoo (2017) is analyzed. This data was collected at the Mayo Clinic between 1974 and 1986 and contains the following 19 variables with 276 observations:
We also considered PBC data because it often used in survival regression. According to Cook (2003), the SIR application in survival regression requires bivariate slicing of the observed survival time and the censoring status. In this analysis, the observed survival time was sliced first into 2, 3, 4, and 5 categories, and then the observations with each category were secondly partitioned by two groups depending on the censoring status. Therefore, the numbers of slices under consideration were 4, 6, 8, and 10 for SIR. The same slicing scheme was applied for FSIR; in addition, we considered three fusing cases of (4, 6), (4, 6, 8), and (4, 6, 8, 10). Table 1 reports the
According to Table 1, with level 5%, the SIR application to PBC data determines that
According to Yoo (2017), for the same data, FSIR shows more robust basis estimation than SIR for each case of
This paper investigates robustness in the dimension determination by FSIR (Cook and Zhang, 2014) over SIR (Li, 1991). A permutation approach is employed to avoid difficulty in deriving related test statistics.
Numerical studies show that FSIR has more robust dimension estimation to the numbers of slices than SIR, and confirms that FSIR has a potential advantage in an inference on the central subspace over SIR.
Usage of the asymptotic distribution of the test statistics in FSIR enables avoiding the additional requirement of the permutation test. It also reduces the computing time for the related
The authors are grateful to two reviewers and the Associate Editor for insightful comments to improve the paper. For Jae Keun Yoo, this work was supported by Basic Science Research Program of the National Research Foundation of Korea (NRF) funded by the Korean Ministry of Education (NRF- 2017R1A2B1004909 and 2009-0093827). For YuNa Cho, this work was supported by the BK21 Plus Project of National Research Foundation of Korea (NRF) funded by the Korean Ministry of Education (22A20130011003).
2 pairs | |||
3 pairs | |||
All |
SIR4 | SIR6 | SIR8 | SIR10 | FSIR6 | FSIR8 | FSIR10 | |
---|---|---|---|---|---|---|---|
H0: | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
H0: | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
H0: | 0.454 | 0.020 | 0.009 | 0.062 | 0.032 | 0.008 | 0.010 |
H0: | N/A | 0.725 | 0.286 | 0.955 | 0.610 | 0.331 | 0.631 |
SIR = sliced inverse regression; FSIR = fused SIR.