The least absolute shrinkage and selection operator (LASSO) has been a popular regression estimator with simultaneous variable selection. However, LASSO does not have the oracle property and its robust version is needed in the case of heavy-tailed errors or serious outliers. We propose a robust penalized regression estimator which provide a simultaneous variable selection and estimator. It is based on the rank regression and the non-convex penalty function, the smoothly clipped absolute deviation (SCAD) function which has the oracle property. The proposed method combines the robustness of the rank regression and the oracle property of the SCAD penalty. We develop an efficient algorithm to compute the proposed estimator that includes a SCAD estimate based on the local linear approximation and the tuning parameter of the penalty function. Our estimate can be obtained by the least absolute deviation method. We used an optimal tuning parameter based on the Bayesian information criterion and the cross validation method. Numerical simulation shows that the proposed estimator is robust and effective to analyze contaminated data.
Classical variable selection methods such as stepwise selection needed two sequential steps for estimation and variable selection. Tibshirani (1996) proposed the least absolute shrinkage and selection operator (LASSO), which can estimate regression parameters with simultaneous variable selection. However, LASSO can be biased for coefficients with large absolute values. This problem was addressed by Fan and Li (2001) who proposed a non-convex penalty function to make up for the deficiencies of LASSO, the smoothly clipped absolute deviation (SCAD) penalty. The SCAD penalty has an oracle property: the asymptotic bias, variance and distribution of the resulting estimate are the same and as if the correct subset were known in advance. Variable selection methods for regression models are covered by Jung and Park (2015) and Lee (2015).
The least squares estimate (LSE) is sensitive to even a single outlier. One alternative is the least absolute deviation (LAD) estimate. Jung (2011, 2013) proposed robust estimators and outlier detection methods in regression models and support vector machine. There are several robust versions of LASSO. For example, Wang
In this paper we develop a robust rank regression method with a pairwise difference of residuals and the SCAD penalty function (RANK-SCAD) to improve the performance of a variable selection for RANK-LASSO. The proposed method combines the robustness of the rank regression and the oracle property of the SCAD penalty. Like RANK-LASSO, the proposed estimator RANK-SCAD has an outlier-resistant loss function. We conjecture that RANK-SCAD has the oracle property in the same manner as SCAD. RANK-SCAD may be a robust model selector having the oracle property.
This paper is organized as follows. In Section 2 we propose RANK-SCAD and describe its statistical properties. The sum of pairwise difference of residuals and the SCAD penalty functions are briefly reviewed. Section 3 gives an efficient algorithm to implement RANK-SCAD. We use a local linear approximation (LLA) algorithm which can be a linear system to minimize a non-differentiable and non-convex objective function. Zou and Li (2008) proposed LLA for nonconcave penalized likelihood models. Section 4 provides the simulation results and it shows that the performance of RANK-SCAD is superior to LSE, LASSO, LAD-LASSO, and RANK-LASSO in the view of selecting the correct model and the prediction error. Furthermore, it shows that RANK-SCAD is more robust than LASSO and RANK-LASSO. Section 5 provides some discussion and concluding remarks.
We consider the classic linear regression model
where
The underdetermined problem can be treated by penalized regression models. Hoerl and Kennard (1970) proposed ridge regression with squares errors and the
where
Fan and Li (2001) described the conditions of a good penalty function “(1) Unbiasedness: The resulting estimator is nearly unbiased when the true unknown parameter is large. (2) Sparsity: The resulting estimator is a thresholding rule, which automatically sets small estimated coefficients to zero to reduce model complexity. (3) Continuity: The resulting estimator is continuous in the data to avoid instability in model prediction”. The SCAD penalty is defined as
and so its derivative becomes
where
Nonetheless, the least squares criterion used in (
Jung (2012) also considered the problem with the LAD loss function with the SCAD penalty function by replacing the
There are several robust loss function such as trimmed squares criterion, Huber’s loss function. However, these stem from the parametric method and not the nonparametric method. The property of the nonparametric methods has a free distribution of errors and is robust regardless of error distributions. Jaeckel (1972) proposed a rank regression that minimizes the dispersion of the residuals. That can be written by minimizing the objective function
where
which is called the RANK-LASSO estimator.
For the purpose of the robustness of a shrinkage estimator, in the objective function (
The pairwise differences of the residuals offers 95% improved efficiency for the normal distribution relative to the sum of squared residuals. So does the objective function (
The SCAD penalty in (
where
In this paper we use two choice methods of the penalty parameter such as the 5-folds CV method and the BIC criterion in Section 4.
When the objective function is differentiable at every point, we can use a standard optimization package to find the argument of the minimum of the objective function
Jung (2012) used the local quadratic approximation of the SCAD function (Fan and Li, 2001) which has a drawback of backward stepwise variable selection. Instead, we use the following approximation of the SCAD function
Approximation of the SCAD function transforms the objective function into linear equations that allows a simple way to find the solution of the RANK-SCAD estimator. Putting the approximation (
Then up to an additive constant we obtain the objective function such as the sum of absolute function
The pairwise differences of the residuals (Kim
where
And we define the augmented matrices such as
where
where || · || is the
The proposed estimator (
Next, we find the estimator of the intercept
where
Finally, we focus on finding an optimal
The algorithm can be summarized as the following iterative steps:
Step 1. Obtain the unpenalized rank estimator satisfying (
Step 2. For given
Step 3. Compute the quantity (
Step 4. Find the optimal
In this section we conducted simulation in various situations to show the robustness of the RANK-SCAD estimate which is resistant to heavy-tailed errors and outliers. We compare the proposed method RANK-SCAD estimate with LSE, LASSO from (
We simulated the data set consisting of observations from the linear regression model
where
We chose the sample sizes
The simulation data consists of training data and test data. After obtaining the regression coefficients based on the training data, the performance of the estimate is evaluated on the test data of the sample size 1,000 generating from the normal distribution with the covariance matrix described above. We conducted 100 simulation iterations to evaluate the error of the estimates in the average of the mean absolute deviations on the test data. The evaluation of the sparseness of an estimate can be measured by the average number of correctly estimated zero coefficients which is the column labeled “correct” in the tables. Analogously the column labeled “incorrect” denotes the number of zero estimates which are not zero coefficients, and it can be a measure of inaccuracy for discarding erroneously important variables. The simulation results are summarized in Tables 1
Tables 1 and 2 show that the proposed method RANK-SCAD outperforms the other methods. Furthermore, the performance of RANK-SCAD is quite comparable to that of the oracle, especially in the view of sparseness and the model errors. RANK-SCAD is considered to be better than other estimates such as LASSO and RANK-LASSO in terms of variable selection and prediction. The term “incorrect” of RANK-SCAD is negligible. Especially, under all distributions of errors the model error of RANK-SCAD is comparable to the oracle because it is evident from the property of nonparametric methods. All methods have smaller standard deviations as the sample size grows that also imply the consistency of the method.
Now we conducted the simulation on the data with outliers, which is consisted of adding the 20% regression outliers to the data by changing
In this section we adopted the proposed method to the prostate cancer data with larger explanatory variables and sample size, which is analyzed in Tibshirani (1996). The data consists of 97 observations with the response variable
We used RANK-SCAD methods with two tuning parameters by CV (RS-CV) and BIC criterion (RS-BIC). We compared the RANK-SCAD estimator with LASSO, LAD-LASSO, RANK, and RANK-LASSO. We use 67 randomly chosen observations as the training data and the remaining as the testing data for 100 replications. The prediction accuracies of these method are measured by the mean absolute prediction error (MAPE) on the testing data. The distribution of these 100 MAPE values is depicted graphically in Figure 1. Our proposed estimator with CV method yields the smallest median of MAPE. The prediction accuracy of the RANK-SCAD remains satisfactory. The standard error of the MAPE estimate for LASSO, LAD-LASSO, RANK-LASSO, and RANK-SCAD(CV), RANK-SCAD(BIC) are also reported as 0.085, 0.071, 0.246, 0.073, 0.113, respectively.
Similarly to Kim
We proposed a robust variable selection method regardless of error distributions in the regression model using the rank regression estimator with the SCAD penalty function. The method preserves the advantages of both robustness from the rank regression and simultaneous variable selection from the SCAD penalty function, which will especially provide a conjecture that the proposed estimator has an oracle property. The proposed algorithm is much faster and effective. Numerical simulation shows that the proposed method RANK-SCAD performs well as the oracle estimate for variable selection.
In future we will study penalized regressions to reduce the influence of leverage points based on the rank regression simply by considering the weight to the loss function. We only have studied the linear regression model; however, the proposed method can be extended to other regression models such as generalized linear models.
Jung’s research was supported by Basic Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2015R1D1A1A01060528).
Simulation results for normal errors without outliers
Method | Correct | Incorrect | AMAD | ||
---|---|---|---|---|---|
1 | 30 | LSE | 0.00 (0.000) | 0.00 (0.000) | 0.458 (0.136) |
LASSO | 4.28 (0.188) | 0.00 (0.000) | 0.484 (0.188) | ||
LAD-LASSO | 3.62 (1.117) | 0.00 (0.000) | 0.362 (0.168) | ||
RANK-LASSO | 3.34 (1.125) | 0.00 (0.000) | 0.436 (0.164) | ||
RANK-SCAD | 4.96 (0.197) | 0.02 (0.141) | 0.292 (0.173) | ||
oracle | 5.00 (0.000) | 0.00 (0.000) | 0.254 (0.105) | ||
60 | LSE | 0.00 (0.000) | 0.00 (0.000) | 0.318 (0.081) | |
LASSO | 4.32 (0.863) | 0.00 (0.000) | 0.376 (0.118) | ||
LAD-LASSO | 3.71 (1.038) | 0.00 (0.000) | 0.244 (0.103) | ||
RANK-LASSO | 3.18 (1.395) | 0.00 (0.000) | 0.325 (0.118) | ||
RANK-SCAD | 4.99 (0.100) | 0.00 (0.000) | 0.197 (0.086) | ||
oracle | 5.00 (0.000) | 0.00 (0.000) | 0.185 (0.080) | ||
30 | LSE | 0.00 (0.000) | 0.00 (0.000) | 0.808 (0.235) | |
LASSO | 4.25 (0.947) | 0.01 (0.100) | 0.954 (0.338) | ||
LAD-LASSO | 3.07 (1.139) | 0.03 (0.171) | 0.740 (0.301) | ||
RANK-LASSO | 3.48 (1.259) | 0.01 (0.100) | 0.857 (0.333) | ||
RANK-SCAD | 4.85 (0.458) | 0.29 (0.498) | 0.709 (0.393) | ||
oracle | 5.00 (0.000) | 0.00 (0.000) | 0.448 (0.206) | ||
60 | LSE | 0.00 (0.000) | 0.00 (0.000) | 0.529 (0.139) | |
LASSO | 4.40 (0.725) | 0.00 (0.000) | 0.666 (0.192) | ||
LAD-LASSO | 2.94 (1.238) | 0.00 (0.000) | 0.454 (0.217) | ||
RANK-LASSO | 3.53 (1.159) | 0.00 (0.000) | 0.530 (0.189) | ||
RANK-SCAD | 4.97 (0.223) | 0.06 (0.239) | 0.376 (0.230) | ||
oracle | 5.00 (0.000) | 0.00 (0.000) | 0.312 (0.123) |
AMAD = average of the mean absolute deviations; LSE = least squares estimate; LASSO = least absolute shrinkage and selection operator; LAD = least absolute deviation.
Simulation results for
Method | Correct | Incorrect | AMAD | ||
---|---|---|---|---|---|
1 | 30 | LSE | 0.00 (0.000) | 0.00 (0.000) | 0.818 (0.545) |
LASSO | 4.18 (1.067) | 0.10 (0.362) | 0.935 (0.560) | ||
LAD-LASSO | 3.38 (1.023) | 0.03 (0.171) | 0.485 (0.263) | ||
RANK-LASSO | 3.23 (1.434) | 0.01 (0.100) | 0.626 (0.278) | ||
RANK-SCAD | 4.93 (0.383) | 0.09 (0.321) | 0.429 (0.543) | ||
oracle | 5.00 (0.000) | 0.00 (0.000) | 0.446 (0.543) | ||
60 | LSE | 0.00 (0.000) | 0.00 (0.000) | 0.472 (0.176) | |
LASSO | 4.60 (0.603) | 0.01 (0.100) | 0.777 (0.332) | ||
LAD-LASSO | 3.77 (1.014) | 0.00 (0.000) | 0.283 (0.133) | ||
RANK-LASSO | 3.62 (1.179) | 0.00 (0.000) | 0.404 (0.155) | ||
RANK-SCAD | 4.98 (0.141) | 0.02 (0.141) | 0.234 (0.156) | ||
oracle | 5.00 (0.000) | 0.00 (0.000) | 0.279 (0.153) | ||
30 | LSE | 0.00 (0.000) | 0.00 (0.000) | 1.330 (0.538) | |
LASSO | 4.19 (1.032) | 0.35 (0.642) | 1.536 (0.732) | ||
LAD-LASSO | 2.68 (1.154) | 0.08 (0.273) | 0.947 (0.418) | ||
RANK-LASSO | 3.25 (1.493) | 0.13 (0.418) | 1.103 (0.464) | ||
RANK-SCAD | 4.83 (0.451) | 0.69 (0.677) | 1.173 (0.673) | ||
oracle | 5.00 (0.000) | 0.00 (0.000) | 0.706 (0.312) | ||
60 | LSE | 0.00 (0.000) | 0.00 (0.000) | 0.905 (0.366) | |
LASSO | 4.46 (0.702) | 0.18 (0.626) | 1.367 (0.682) | ||
LAD-LASSO | 2.88 (1.174) | 0.01 (0.100) | 0.525 (0.265) | ||
RANK-LASSO | 3.66 (1.174) | 0.01 (0.100) | 0.705 (0.265) | ||
RANK-SCAD | 4.97 (0.171) | 0.20 (0.402) | 0.498 (0.346) | ||
oracle | 5.00 (0.000) | 0.00 (0.000) | 0.483 (0.317) |
AMAD = average of the mean absolute deviations; LSE = least squares estimate; LASSO = least absolute shrinkage and selection operator; LAD = least absolute deviation.
Simulation results for
Method | Correct | Incorrect | AMAD | ||
---|---|---|---|---|---|
1 | 30 | LSE | 0.00 (0.000) | 0.00 (0.000) | 2.359 (0.705) |
LASSO | 4.43 (0.902) | 1.07 (0.832) | 2.222 (0.700) | ||
LAD-LASSO | 2.86 (1.271) | 0.13 (0.338) | 0.875 (0.567) | ||
RANK-LASSO | 4.15 (0.999) | 0.47 (0.688) | 1.244 (0.698) | ||
RANK-SCAD | 4.91 (0.288) | 1.24 (0.726) | 1.632 (0.796) | ||
oracle | 5.00 (0.000) | 0.00 (0.000) | 1.294 (0.537) | ||
60 | LSE | 0.00 (0.000) | 0.00 (0.000) | 1.481 (0.377) | |
LASSO | 4.46 (0.797) | 0.49 (0.703) | 1.827 (0.629) | ||
LAD-LASSO | 3.41 (1.065) | 0.00 (0.000) | 0.370 (0.172) | ||
RANK-LASSO | 3.97 (0.979) | 0.03 (0.223) | 0.713 (0.313) | ||
RANK-SCAD | 4.98 (0.141) | 0.41 (0.552) | 0.630 (0.492) | ||
oracle | 5.00 (0.000) | 0.00 (0.000) | 0.758 (0.320) | ||
30 | LSE | 0.00 (0.000) | 0.00 (0.000) | 2.464 (0.745) | |
LASSO | 4.62 (0.663) | 1.34 (0.855) | 2.528 (0.758) | ||
LAD-LASSO | 2.25 (1.266) | 0.26 (0.463) | 1.604 (0.791) | ||
RANK-LASSO | 4.29 (1.066) | 1.03 (0.926) | 1.959 (0.833) | ||
RANK-SCAD | 4.91 (0.321) | 1.71 (0.782) | 2.176 (0.861) | ||
oracle | 5.00 (0.000) | 0.00 (0.000) | 1.351 (0.547) | ||
60 | LSE | 0.00 (0.000) | 0.00 (0.000) | 1.615 (0.444) | |
LASSO | 4.44 (0.935) | 0.77 (0.874) | 2.101 (0.748) | ||
LAD-LASSO | 2.57 (1.208) | 0.03 (0.171) | 0.754 (0.341) | ||
RANK-LASSO | 3.91 (1.045) | 0.17 (0.451) | 1.177 (0.486) | ||
RANK-SCAD | 4.96 (0.197) | 1.01 (0.703) | 1.304 (0.598) | ||
oracle | 5.00 (0.000) | 0.00 (0.000) | 0.900 (0.396) |
AMAD = average of the mean absolute deviations; LSE = least squares estimate; LASSO = least absolute shrinkage and selection operator; LAD = least absolute deviation.
Estimates for the prostate cancer data and the modified data
LASSO | LAD-LASSO | RANK-LASSO | RS(CV) | RS(BIC) | |
---|---|---|---|---|---|
0.472 (0.000) | 0.524 (0.450) | 0.562 (0.244) | 0.552 (0.000) | 0.519 (0.419) | |
0.187 (0.000) | 0.480 (0.516) | 0.350 (0.000) | 0.496 (0.000) | 0.483 (0.698) | |
0.000 (0.000) | 0.000 (0.000) | −0.018 (0.000) | −0.018 (0.000) | −0.009 (−0.017) | |
0.000 (0.000) | 0.000 (0.000) | 0.121 (0.000) | 0.102 (0.000) | 0.017 (0.000) | |
0.368 (0.000) | 0.249 (0.319) | 0.529 (0.000) | 0.691 (0.000) | 0.567 (0.610) | |
0.000 (0.000) | 0.000 (0.000) | −0.054 (0.000) | −0.074 (0.000) | 0.000 (0.000) | |
0.000 (0.000) | 0.000 (0.000) | 0.000 (0.000) | 0.000 (0.000) | 0.000 (0.000) | |
0.000 (0.000) | 0.000 (0.000) | 0.006 (0.010) | 0.005 (0.000) | 0.004 (0.004) |
The numbers in the parenthesis are the estimates on the modified data.
LASSO = least absolute shrinkage and selection operator; LAD = least absolute deviation; RS = rank smoothly clipped absolute deviation; CV = cross validation; BIC = Bayesian information criterion; lcavol = log cancer volume; lweight = log prostate weight; lbph = log benign prostatic hyperplasia amount; svi = seminal vesicle invasion; lcp = log capsular penetration; gleason = Gleason score; pgg45 = percentage Gleason scores 4 or 5.