Sufficient dimension reduction (SDR) replaces original
Sufficient dimension reduction (SDR) in regression of
where ⫫ stands for independence, and
Statement (
One of the most popular SDR methods should be sliced inverse regression (SIR) (Li, 1991). Implementation of SIR requires a categorization of a response variable
This paper studies FSIR in survival regression (which is out of date) and compares it with SIR under consideration of various slicing schemes. Slicing with a two dimensional response is more complex than that with a dimensional response since the survival regression takes the observed survival time and the censoring indicator as response variables. Therefore, FSIR is expected to have potential advantages in robustness to the basis estimation of over SIR.
The organization of the paper is as follows. In Section 2, we discuss a fused sliced inverse regression along with applicability to survival regression. Some issues that can arise in slicing under survival regression are also discussed in the same section. Section 3 is devoted to numerical studies and presentation of a real data application. We summarize our work in Section 4. We will define the following notations, which will be used frequently throughout the rest of the paper. A subspace (
Before explaining SIR (Li, 1991), the predictor
For the exhaustive estimation of , it is typically assumed that (∑^{−}^{1}
In population, the construction of
Construct
Compute
Construct
Perform the spectral decomposition on
Determine the structural dimension
A set of the eigenvectors corresponding to the first
Back-transform to obtain the sample basis estimate for
In the implementation of SIR in practice, the construction of
To overcome the deficit that SIR is sensitive to the number of slices, Cook and Zhang (2014) propose a fused approach to combine the
where
Since
The assumption of
Therefore,
The sample version
The inference of through
where
It should be noted that any choice of
Survival regression is a study of the conditional distribution of survival time
Since
The bivariate slicing for
We consider two types of survival regressions, which are accelerated failure time (AFT) and Cox-proportional Hazards (CPH) models. For both models, the predictors of
With this predictor setting, the following AFT model, shown in Section 5 of Datta
where a random error
For CPH models, first, we followed an example in Section 4.2 of Yoo and Lee (2011). Its hazard and baseline hazard rates are
In the AFT and CPH models, the right-censoring scheme was considered, so the observed survival time
To measure how well
To covert the correlation (larger, better) to the distance (smaller to better), we define the trace distance of
According to Figures 1
For illustration purposes, we use primary biliary cirrhosis (PBC) data in Tibshirani (1997) and Yoo and Lee (2011). The data is collected at the Mayo Clinic between 1974 and 1986 and consists of the following 19 variables with 276 observations:
Tibshirani (1997) fitted the data to the CPH model with the 17 predictors, and Yoo and Lee (2011) applied SIR to the PBC data for model-free variable selection.
In the data, the censoring percentage is about 60%, so it is expected that SIR applications with 4, 6, 8, 10 slices yield similar basis estimates. The same slicing schemes as the numerical studies are used for FSIR and SIR. After conducting the various application of FSIR and SIR, the trace distances between
Table 1 also indicates that FSIR is clearly more robust to slicing schemes for any choices of
The goal of the paper is to study FSIR (Cook and Zhang, 2014) in survival regression. Bivariate slicing schemes can affect the estimation of the central subspace by a SIR (Li, 1991) since survival regression considers the observed survival time and censoring status as responses. This issue can be relieved by fusing SIR kernel matrices from the various bivariate slicing schemes. To reach the goal, the FSIR applies to survival regression by considering various slicing schemes, and is compared with SIR.
Numerical studies confirm that the FSIR has potential advantages in the basis estimation in survival regression over SIR. The real data application also shows that the FSIR results in more robust estimates than SIR.
FSIR cannot have a direct application to correlated or clustered survival data since the SIR is based on iid observations. A study along with this side is in progress.
The authors are grateful to two reviewers and Associate Editor for insightful comments to improve the paper. For the author Jae Keun Yoo, this work was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Korean Ministry of Education (NRF-2014R1A2A1A11049389 and 2009-0093827).
The average and maximum trace distances for each of FSIR and SIR applications to PBC data in Section 3.2
Method | ||||
---|---|---|---|---|
Averages | FSIR | 0.003 | 0.013 | 0.025 |
SIR | 0.018 | 0.080 | 0.093 | |
Maximum | FSIR | 0.029 (F1.10 & F2.6) | 0.079 (F1.6 & F1.10) | 0.101 (F1.10 & F2.6) |
SIR | 0.086 (S1.10 & S2.4) | 0.263 (S1.4 & S1.10) | 0.176 (S1.8 & S2.4) |
FSIR = fused sliced inverse regression; SIR = sliced inverse regression.