TEXT SIZE

CrossRef (0)
On robustness in dimension determination in fused sliced inverse regression

Jae Keun Yoo1,a, and Yoo Na Choa

aDepartment of Statistics, Ewha Womans University, Korea
Correspondence to: 1Department of Statistics, Ewha Womans University, 52, Ewhayeodae-gil, Seodaemun-gu, Seoul 03760, Korea. E-mail: peter.yoo@ewha.ac.kr
Received March 9, 2018; Revised May 11, 2018; Accepted May 13, 2018.
Abstract

The goal of sufficient dimension reduction (SDR) is to replace original p-dimensional predictors with a lower-dimensional linearly transformed predictor. The sliced inverse regression (SIR) (Li, Journal of the American Statistical Association, 86, 316–342, 1991) is one of the most popular SDR methods because of its applicability and simple implementation in practice. However, SIR may yield different dimension reduction results for different numbers of slices and despite its popularity, is a clear deficit for SIR. To overcome this, a fused sliced inverse regression was recently proposed. The study shows that the dimension-reduced predictors is robust to the numbers of the slices, but it does not investigate how robust its dimension determination is. This paper suggests a permutation dimension determination for the fused sliced inverse regression that is compared with SIR to investigate the robustness to the numbers of slices in the dimension determination. Numerical studies confirm this and a real data example is presented.

Keywords : central subspace, fused sliced inverse regression, permutation test, sufficient dimension reduction
1. Introduction

Consider a regression of Y ∈ ℝ1|X ∈ ℝp = (X1, …, Xp)T. Then, sufficient dimension reduction (SDR) in regression replaces the original p-dimensional predictors X with a lower-dimensional linearly transformed predictor MTX without loss of information about Y|X such that

$Y⫫X∣MTX,$

where M ∈ ℝp×q, ⫫ stands for independence, and qp.

Statement (1.1) implies that the two regressions of Y|X and Y|MTX are equivalent; therefore, X can be replaced by MTX without loss of information about Y|X. A column subspace of M satisfying (1.1) is defined as a dimension reduction subspace. If the intersection of all possible dimension reduction subspaces is a dimension reduction subspace, it is minimal and unique among all possible dimension reduction subspaces. The intersection is defined as the central subspace . Therefore, the estimation of is the main goal of SDR. The rest of the paper denotes the true dimension and orthonormal basis matrix of as d and η ∈ ℝp×d, respectively. The lower-dimensional linearly transformed predictor ηTX is called sufficient predictors.

The sliced inverse regression (SIR) (Li, 1991) is one of the most popular SDR methods due to large applicability and simple implementation in practice. The key step of the SIR application is a categorization of a response variable Y. This categorization is called slicing. The performance of SIR does not theoretically depend on the numbers of slices; however, its sample behaviors to estimate are sensitive to the numbers of slices. Even critically, there is no optimal or recommended one, which can be adopted as a golden standard. To relieve this issue, Cook and Zhang (2014) recently suggest an approach to fuse sample kernel matrices of SIR constructed from various choices of slices. This fusing approach is known as fused sliced inverse regression (FSIR). According to Cook and Zhang (2014), the FSIR results in robust basis estimation of to the numbers of slices.

The inference of requires two steps. First, the structural dimension d of should be determined, and the basis η is estimated given that d = . However, Cook and Zhang (2014) do not provide the clear dimension determination and its asymptotic behaviors.

This paper investigates robustness in the dimension estimation of in FSIR by employing a permutation approach. If FSIR is sensitive to determine the structural dimension, then this accordingly affects the robustness in the basis estimation. This study can completely prove the potential advantages of FSIR over SIR in robust and possibly better estimation of .

The organization of the paper is as follows. In Section 2, we briefly discuss a FSIR and suggest a permutation dimension determination. Section 3 is devoted to numerical studies and the presentation of a real data application. Our work is summarized in Section 4. We will define the following notations, which will be used frequently throughout the rest of the paper. A subspace stands for a subspace spanned by the columns of B ∈ ℝp×q; in addition, we define that = cov(X).

2. Permutation dimension determination in fused sliced inverse regression

### 2.1. Sliced inverse regression

The method of SIR (Li, 1991) will be explained through the normalized predictor Z = −1/2(XE(X)) of X. For notational convenience, define and ηz as the central subspace for a regression of Y|Z and its p × d orthonormal basis matrix, respectively. Proposition 6.3 in Cook (1998) guarantees that the structural dimensions of and are equal and that . Assume that the following linearity condition holds: (C1) $E(Z∣ηzTZ)$ is linear in $ηzTZ$. Li (1991) shows the following relation under the linearity condition:

$S(E(Z∣Y))⊆Sy∣z⇔S(Σ-1E(X∣Y))⊆SY∣X.$

To guarantee the exhaustive estimation of , it is usually assumed that . An estimation of through E(X|Y) is called SIR.

Since any specific models of Y|Z are not assumed in SIR, E(Z|Y) should be restored nonparametically. If Y is categorical with h levels, the construction of E(Z|Y) is straightforward, which is a set of the means of Z within each level of Y. If Y is continuous or many-valued, the computation of E(Z|Y) is unclear. For its simple but effective estimation in the case, Y is categorized with h levels. The categorized response will be denoted as , and the computation of E(Z| = s) is straightforward. Assuming that , E(Z|Y) is easily and nonparametrically recovered, assuming that . This categorization of Y is often called slicing, which is the essential part in the SIR application in practice. Next, SIR constructs the kernel matrix of either MSIR = cov(E(Z|Y)) or MSIR = cov(E(Z|)), depending on the type of Y. Then the columns of MSIR spans .

The sample SIR algorithm is as:

• Construct by partitioning the range of Y into h non-overlapping intervals to have equal numbers of the observations as much as possible, if Y is continuous or many-valued. If Y is categorical, the original Y becomes . Denote ns as the number of observations for = s for s = 1, …, h.

• Construct i = ∑̂−1(Xi), and then compute Ê(Z| = s) = (1/ns)=si for s = 1, …, h.

• Compute the sample version SIR of MSIR as:

$M^SIR=cov^(E(Z∣Y˜))=∑s=1hnsnE^(Z∣Y˜=s)E^(Z∣Y˜=s)T.$

• Spectral-decompose SIR such that $M^SIR=∑i=1pα^jφ^jφ^jT$, where α̂1 ≥ ··· ≥ α̂p ≥ 0 are the ordered-eigenvalues of SIR and φ̂i, i = 1, …, p, is the corresponding eigenvector to α̂i.

• The structural dimension d is determined through a sequential test, and set d = .

• The eigenvectors φ̂i corresponding to the first largest α̂is become the estimate of ηz:

$η^z=(φ^1,…,φ^d^)$

to have the sample basis estimate for η so that η̂ = ∑̂−1/2η̂z.

In the sample implementation of SIR, the construction of SIR clearly depends on the numbers of h. In practice, different numbers of h yield different asymptotic results in basis and dimension estimation of , and hence this should be worrisome in the data application. Moreover, there is no golden standard on how many slices should be used.

### 2.2. Fused sliced inverse regression

To overcome the deficit of the sensitiveness of SIR to the numbers of slices, a fused approach is developed by Cook and Zhang (2014) to combine the MSIR constructed from various numbers of slices. Now define

$EF(h)=(E(Z∣Y˜(h1)),E(Z∣Y˜(h2)),…,E(Z∣Y˜(hK))) and MFSIR(h)=EF(h)EF(h)T,$

where Y(hK) indicates the categorized response into hK slices.

Since $MFSIR(h)$ is a non-decreasing sequence over h, the following relation holds that

$S(MSIR(k))⊆S(MFSIR(h)), k=2,3,…,h.$

Furthermore, the assumed equivalence of $S(MSIR(k))=SY∣Z$ guarantees that

$S(MFSIR(h))=SY∣Z and S(Σ-12MFSIR(h))=SY∣X.$

This directly implies that the columns of $MFSIR(h)$ spans . The inference on through $MFSIR(h)$ is called FSIR.

The sample version $M^FSIR(h)$ is computed by constructing $M^SIR(k)$ via usual SIR application:

$M^FSIR(h)=(M^SIR(2),M^SIR(3),…,M^SIR(h)).$

The inference procedure of via $M^FSIR(h)$ is the same as that through SIR. Let $M^FSIR(h)=∑i=1pλ^iγ^iγ^iT$, be the spectral decomposition of $M^FSIR(h)$, where λ̂1 ≥ ··· ≥ λ̂p ≥ 0 are the ordered-eigenvalues of $M^FSIR(h)$ and γ̂i is its corresponding eigenvector for i = 1, …, p.

Cook and Zhang (2014) provide that FSIR is robust in the basis estimation of to the number of slices compared with SIR, and they estimate ηz with the true structural dimension given. However, its result clearly affects the basis estimation since the dimension should be determined before the basis estimation. That is, if the dimension determination is not robust to the numbers of slices, then it affects the selection of the eigenvectors λ̂i. Accordingly, this falsely impacts the robustness in the estimation of .

For this, in the next section, a permutation approach is suggested to estimate the structural dimension d of through FSIR.

### 2.3. Permutation dimension determination

It is necessary to first define related hypothesis before estimating the true structural dimension. For this, the following sequence of hypothesis (Rao, 1965) is tested. Starting with m = 0, test H0: d = m versus H1: d = m+1. If H0: d = m is rejected, increment m by 1 and repeat the test, stopping the first time H0 is not rejected and setting = m. To conduct this sequential test, a test statistic is necessary. To construct the test statistics, it should be observed that the dimension of is equal to the true rank of $MFSIR(h)$, because $S(MFSIR(h))=SY∣Z$. Then the inference on d is equal to the rank estimation of $MFSIR(h)$. This is equivalent to test how many non-zero eigenvalues exist for its spectral decomposition. Based on this discussion, the related test statistics for the dimension determination is suggested as

$Λ^m=n∑i=m+1pλ^i,$

where n is a sample size.

To avoid the difficulty in deriving the asymptotics of Λ̂m under H0, a permutation dimension determination approach is adopted here. The permutation approach has had a long history in SDR literature, but it is still popularly used (Cook and Weisberg, 1991; Cook and Yin, 2002; Yin and Cook, 2002; Lee et al., 2013; Dong et al., 2016). One possible drawback is that it needs an additional condition of

$(Y,Γ1TZ)⫫Γ2TZ,$

where Γ1 = (γ1, …, γm) is a set of eigenvectors corresponding to the m largest eigenvalues of $MFSIR(h)$ under H0: d = m, and Γ2 ∈ ℝp×(pm) is the orthogonal complement of Γ1. If Z is normally distributed, the condition is satisfied under H0. Since the predictors are often transformed to normality to satisfy linearity, the additional condition would not be strong in practice.

The permutation dimension determination algorithm is:

• Construct $M^FSIR(h)$. Under H0: d = m, obtain Λ̂m and divide eigenvector matrices into the following two groups:

$Γ^1=(γ^1,…,γ^m) and Γ^2=(γ^m+1,…,γ^p).$

• Compute two sets predictors of $V^i∈ℝm×1=Γ^1TZ^i$ and $U^i∈ℝ(p-m)×1=Γ^2TZ^i$, for i = 1, …, n.

• Randomly permute the index i of the Ûi alone and construct the permuted set $U^i*$.

• Compute the test statistic $Λ^m*$ via the FSIR application on a regression of $Yi∣(V^i,U^i*)$.

• Repeat steps (3)–(4) N times, where N is the total number of permutations. The p-value of the hypothesis testing is the fraction of $Λ^m*$ that exceed Λ̂m.

In this permutation determination procedure, it had better be noted that the p-values may not have an increasing tendency as m increases. The number of permutation N = 1000 will be used in numerical studies and real data analysis.

3. Numerical studies and real data application

### 3.1. Numerical studies

We considered the following two regression models:

• Model 1: Y|X = (X1, …, X10)T = X1 + ɛ.

• Model 2: Y|X = (X1, …, X10)T = X1 + exp(X2)ɛ.

For both models, the predictors of X = (X1, …, X10)T and the random error ɛ were independently sampled either from N(0, 1) or from t10.

For Model 1, the structural dimension of is equal to one, and is spanned by η = (1, 0, …, 0)T. The structure of the regression is quite simple. All information on the regression for predictors is completely given in the first conditional moment only through X1. However, in Model 2, the two columns of η = ((1, 0, …, 0), (0, 1, 0, …, 0))T spans , and its true dimension is equal to two. The regression depends on its first two conditional moments. One sufficient predictor X1 explains the conditional mean, while the other sufficient predictor X2 contributes to the conditional variance. Therefore, Model 2 has a more complex regression than Model 1.

Tho sample sizes of n = 100 and 200 were considered, and each model was iterated 500 times with h = 4, 5, 7, 9. The 5% nominal level was used for all simulation. For SIR, if predictors are normally-distributed, the dimension was determined through a χ2 test; if not, a weighted sum of independent χ2s was used.

As a summary of the numerical studies, the percentages of the correct dimension determination for each model were initially computed. Furthermore, to investigate the robustness in the dimension estimation, for each iteration of the model, the concordant decisions with its true dimension were investigated with two or more slice choices. Let h be the estimate of the dimension by SIR or FSIR with h slices. Then, for each model, the following percentages were additionally computed.

 2 pairs d̂4 = d̂5 = d d̂4 = d̂7 = d d̂4 = d̂9 = d d̂5 = d̂7 = d d̂5 = d̂9 = d d̂7 = d̂9 = d 3 pairs d̂4 = d̂5 = d̂7 = d d̂4 = d̂5 = d̂9 = d d̂4 = d̂7 = d̂9 = d d̂5 = d̂7 = d̂9 = d All d̂4 = d̂5 = d̂7 = d̂9 = d

For Model 1 and 2, the dimension d is one and two, respectively. If SIR and FSIR are robust to the numbers of slices in the dimension estimation, the correct decision percentages for each number of slices and the 2–4 pairs of the equal decision percentages must be close to 95%. These are reported in Figure 1. For each figure, the horizontal axis represents various choices of slices. For example, “479” stands for the percentages of 4 = 7 = 9 = d and “all” does for those of 4 = 5 = 7 = 9 = d.

Figure 1 shows characteristic behaviors in the dimension estimation. In most numerical studies, FSIR shows better dimension determination results than SIR. With smaller sample size, FSIR is more robust to the numbers of slices than SIR. With complex regression models, FSIR is impacted by the numbers of slices, which is less affected than SIR. FSIR shows more consistent behaviors in the dimension estimation to the distribution of the variables than SIR. As discussed in Cook and Zhang (2014), FSIR yields more robust and better estimation result to the number of slices than SIR. This directly impacts the dimension estimation, so FSIR yields more robust results than SIR. This numerical studies confirm that FSIR should have a potential advantage over SIR in both dimension and basis estimation of .

### 3.2. Real data example: primary biliary cirrhosis data

For illustration purposes, primary biliary cirrhosis (PBC) data in Tibshirani (1997) and Yoo (2017) is analyzed. This data was collected at the Mayo Clinic between 1974 and 1986 and contains the following 19 variables with 276 observations:

• Y = the number of days between registration and the earlier of death or censoring

• δ = 1, if Y is time to death; 0 otherwise

• X1 Treatment code: 1 = D-penicillamine, 2 = placebo

• X2 Age in years

• X3 Gender: 0 = male, 1 = female

• X4 Presence of ascites: absent = 0 or present = 1

• X5 Presence of hepatomegaly: absent = 0 or present = 1

• X6 Presence of spiders: 0 = no or 1 = yes

• X7 Presence of edema: absent and no diuretic therapy = 0, present but no diuretic therapy or edema resolved by diuretics = 0.5, or present despite diuretic therapy = 1

• X8 Serum bilirubin, in mg/dl

• X9 Serum cholesterol, in mg/dl

• X10 Albumin, in g/dl

• X11 Urine copper, in μg/day

• X12 Alkaline phosphatase, in U/liter

• X13 SGOT, in U/ml

• X14 Triglycerides, in mg/dl

• X15 Platelet count; coded value is number of platelets per cubic ml of blood divided by 1,000

• X16 Prothrombin time, in seconds

• X17 Histologic state of disease, graded 1, 2, 3, or 4

We also considered PBC data because it often used in survival regression. According to Cook (2003), the SIR application in survival regression requires bivariate slicing of the observed survival time and the censoring status. In this analysis, the observed survival time was sliced first into 2, 3, 4, and 5 categories, and then the observations with each category were secondly partitioned by two groups depending on the censoring status. Therefore, the numbers of slices under consideration were 4, 6, 8, and 10 for SIR. The same slicing scheme was applied for FSIR; in addition, we considered three fusing cases of (4, 6), (4, 6, 8), and (4, 6, 8, 10). Table 1 reports the p-values for H0: d = m for m = 0, 1, 2, 3.

According to Table 1, with level 5%, the SIR application to PBC data determines that = 2 with 4 and 10 slices and = 3 with 6 and 8 slices. Therefore, with SIR, the dimension determination would be confusing. However, FSIR determines that = 3 for all three cases and supports the different estimation results in SIR that originated from the different numbers of slices.

According to Yoo (2017), for the same data, FSIR shows more robust basis estimation than SIR for each case of d = 1, d = 2, and d = 3. Therefore, the robustness in the dimension estimation coincides with the basis estimation.

4. Conclusion

This paper investigates robustness in the dimension determination by FSIR (Cook and Zhang, 2014) over SIR (Li, 1991). A permutation approach is employed to avoid difficulty in deriving related test statistics.

Numerical studies show that FSIR has more robust dimension estimation to the numbers of slices than SIR, and confirms that FSIR has a potential advantage in an inference on the central subspace over SIR.

Usage of the asymptotic distribution of the test statistics in FSIR enables avoiding the additional requirement of the permutation test. It also reduces the computing time for the related p-values and leaves a room for improving robustness in the dimension determination.

Acknowledgments

The authors are grateful to two reviewers and the Associate Editor for insightful comments to improve the paper. For Jae Keun Yoo, this work was supported by Basic Science Research Program of the National Research Foundation of Korea (NRF) funded by the Korean Ministry of Education (NRF- 2017R1A2B1004909 and 2009-0093827). For YuNa Cho, this work was supported by the BK21 Plus Project of National Research Foundation of Korea (NRF) funded by the Korean Ministry of Education (22A20130011003).

Figures
Fig. 1. Summary for numerical studies; Solid line, FSIR; Dashed line, SIR. SIR = sliced inverse regression; FSIR = fused SIR.
TABLES
 2 pairs d̂4 = d̂5 = d d̂4 = d̂7 = d d̂4 = d̂9 = d d̂5 = d̂7 = d d̂5 = d̂9 = d d̂7 = d̂9 = d 3 pairs d̂4 = d̂5 = d̂7 = d d̂4 = d̂5 = d̂9 = d d̂4 = d̂7 = d̂9 = d d̂5 = d̂7 = d̂9 = d All d̂4 = d̂5 = d̂7 = d̂9 = d

### Table 1

P-values for H0: d = m by SIR and FSIR; SIR#, SIR with # slices; FSIR#, FSIR upto # slices

SIR4SIR6SIR8SIR10FSIR6FSIR8FSIR10
H0: d = 00.0000.0000.0000.0000.0000.0000.000
H0: d = 10.0000.0000.0000.0000.0000.0000.000
H0: d = 20.4540.0200.0090.0620.0320.0080.010
H0: d =3N/A0.7250.2860.9550.6100.3310.631

SIR = sliced inverse regression; FSIR = fused SIR.

References
1. Cook, RD (1998). Regression Graphics: Ideas for Studying Regressions through Graphics. New York: Wiley
2. Cook, RD (2003). Dimension reduction and graphical exploration in regression including survival analysis. Statistics in Medicine. 22, 1399-1413.
3. Cook, RD, and Weisberg, S (1991). Discussion of sliced inverse regression for dimension reduction. Journal of the American Statistical Association. 86, 328-332.
4. Cook, RD, and Yin, X (2001). Dimension reduction and visualization in discriminant analysis. Australian and New Zealand Journal of Statistics. 43, 147-199.
5. Cook, RD, and Zhang, X (2014). Fused estimators of the central subspace in sufficient dimension reduction. Journal of the American Statistical Association. 109, 815-827.
6. Dong, Y, Yang, C, and Yu, Z (2016). On permutation tests for predictor contribution in sufficient dimension reduction. Journal of Multivariate Analysis. 149, 81-91.
7. Lee, K, Oh, S, and Yoo, JK (2013). Method-free permutation predictor hypothesis tests in sufficient dimension reduction. Communications for Statistical Applications and Methods. 20, 291-300.
8. Li, KC (1991). Sliced inverse regression for dimension reduction. Journal of the American Statistical Association. 86, 316-342.
9. Rao, CR (1965). Linear Statistical Inference and Its Application. New York: Wiley
10. Tibshirani, R (1997). The lasso method for variable selection in the Cox model. Statistics in Medicine. 16, 385-395.
11. Yin, X, and Cook, RD (2002). Dimension reduction for the conditional kth moment in regression. Journal of the Royal Statistical Society, Series B. 64, 159-175.
12. Yoo, JK (2017). Fused sliced inverse regression in survival analysis. Communications for Statistical Applications and Methods. 24, 533-541.