Grouping structures in covariates are often ignored in regression models. Recent statistical developments considering grouping structure shows clear advantages; however, reflecting the grouping structure on the quantile regression model has been relatively rare in the literature. Treating the grouping structure is usually conducted by employing a group penalty. In this work, we explore the idea of group penalty to the quantile regression models. The grouping structure is assumed to be known, which is commonly true for some cases. For example, group of dummy variables transformed from one categorical variable can be regarded as one group of covariates. We examine the group quantile regression models via two real data analyses and simulation studies that reveal the beneficial performance of group quantile regression models to the non-group version methods if there exists grouping structures among variables.
In some regression problems we are interested in various percentage points of the distribution rather than the overall average of the data. For these cases, Koenker and Bassett (1978) proposed a quantile regression model that estimates the conditional quantile function of the response variable and identifies the effect for each quantile separately. With the advantages of quantile regression, its usage is clearly increasing in various fields such as economics (Hendricks and Koenker, 1992; Koenker and Hallock, 2001), microarray studies (Wang and He, 2007), demand analysis and empirical finance area (Wei and He, 2006; Wei
We focus on variable selection aspect of penalized quantile regression because variable selection is an important procedure in the modeling as many datasets include a large number of candidate predictors. Several methods were introduced under the regularization framework. Tibshirani R (1996) proposed the least absolute shrinkage and selection operator (LASSO) using
For the case of variables showing grouping structure, it seems more appropriate to reflect a grouping structure for an accurate estimation and prediction. The most common example is the case when a categorical variable is converted to several dummy variables. A group of dummy variables from one original categorical variable can be thought as the same group. Another example of the grouped variables is an additive model with polynomial variables. Terms forming a polynomial such as
Group penalty can be applied when the grouping structure is unknown (thus must be estimated). In biological studies, genetic data usually have background scientific information. For example, genes with the same biological pathway are often located in a neighborhood forming a group. We need to estimate grouping structure of predictors by certain clustering methods for variables such as Chavent and Kuentz-Simonet (2012).
Several penalties considering grouping structure have been proposed. Group LASSO which uses the
The later part of this paper is organized as follows. Penalized quantile regression models with a group penalty are introduced in Section 2. A real data set is analyzed in Section 3. In Section 4, we compare the prediction performance of introduced methods under various simulation settings. The conclusion and discussions are given in Section 5.
We consider the linear model with
where
where
where
Many penalties under mean regression exist for reflecting grouping information. In this study, we propose group penalties under the quantile regression models. Group LASSO penalty, group SCAD penalty, and group MCP under the quantile regression models will be newly defined.
First, group LASSO proposed by Yuan and Lin (2006) uses
where
Besides LASSO, other penalty functions can be applied. By applying SCAD penalty and MCP under quantile regression, they are defined as
From the (
The theoretical properties of the group SCAD and group MCP are rarely known under the conditional mean regression despite the well-known oracle properties of the SCAD and MCP under the linear models. Additional complexity such as the true number of group and whether the number of group is fixed or increasing seems to hinder the exploration of the theoretical grounding. Unfortunately, we do not provide some theoretical properties under the quantile regression with these penalties.
The birth weight data set from Hosmer and Lemeshow (1989) is used for the real data analysis. The data were collected at Baystate Medical Center in Springfield, Massachusetts. The data has the
age: age of mother (years)
weight: weight of mother at last menstrual period (pounds)
visit: number of physician visits during the first trimester (0, 1, 2, or 3 or more).
The five categorical variables are
race: race of mother (white, black, or the others)
smoking: smoking status during pregnancy (yes or no)
pre: history of premature labor (0, 1, 2, or 3)
hyper: history of hypertension (yes or no)
uterine: presence of uterine irritability (yes or no).
The dummy variables from each categorical variable are treated as one group for the group quantile regression model. Race_{1} and race_{2} are created for white and others respectively so that black is baseline category for race. For pre, pre_{1} stands for history of premature labor once and pre_{2} stands for two or more than two times of premature labors. After the preliminary analysis, age^{2} is added to the model. Taking birthweight as a response variable, we fit quantile regression models at
When a full model is fitted with LASSO, SCAD, group LASSO, group SCAD, we have the results for Table 1. From the results, group penalties select both age and age^{2} simultaneously whereas LASSO and SCAD only choose age.
Simulation studies are conducted to see which methods yield a better estimate. First, we randomly divide the data into two parts. The first part is training data with 100 samples, and the other 88 samples are used for test data. The penalty parameter
We repeat this procedure 500 times to reduce the variation from splitting the data. Table 2 then summarizes the mean of PMCE and its standard error.
From the results, the group version of LASSO and SCAD show better performance that the non-group version of penalties for median regression.
Noting that there is a certain lower bound for PMCE, we observe that PMCE cannot be smaller than the mean check error using the whole data in theory. This is because PMCE is calculated from the test data only. For this reason, the mean check error using all samples is obtained and subtracted from PMCE. Subtracting the mean check error is similar to removing the variance of the error
This data set contains measurement of voice recordings from equal number of healthy people and people with Parkinsonism who suffer from speech impairments. The covariates are
We compare the LASSO and SCAD penalized quantile regression models and their counterparts with group penalties. For this purpose, the 1,040 samples are randomly split into 2/3 of training data and other test data. Using the training data, we fit the considered models where the penalty parameters were sought by a 10-fold cross-validation. After choosing an ‘optimal’ penalty parameter, we fit the training data and then predict the values of the response variable in the test data. The prediction error is measured with the check error defined in (
In this section, we compare the prediction performance of LASSO, SCAD, MCP and grouped version of LASSO, SCAD, MCP under the four simulation scenarios. In the first scenario, we fit an additive model of continuous variables through third-order polynomial. In the second scenario, we fit an additive model of both continuous and categorical variables also through a third-order polynomial. In the third we consider an additive model of only categorical variables. Finally, we fit an ANOVA model with all the two-way interactions. In all simulations, the errors
Model I: Random variables
Model II:
Model III: Random variables
Model IV: 4 categorical factors
We generate
For implementation, we use
From the results, we can notice that the group version of LASSO, SCAD, MCP perform better than the models not considering grouping structure in most cases.
Finally, we only investigate heavy tail distributions under the Model I. Now, the error distributions
The results in Table 8 show that the group version of the regression models yield the lower MSE with higher true positive and true negative rates under heavy tail error distribution.
This work presents numerical studies for group penalized quantile regression with the group LASSO, group SCAD, and group MCP penalties for analyzing the data with a grouping structure. The application of group penalty to quantile regression seems reasonable and improves the fit. We also applied group penalties to high dimensional data; however, these penalties did not work well in R package
Yoonsuh Jung’s work was partially supported by National Research Foundation of Korea Grant NRF-2017R1C1B5017431.
Birth weight data: Coefficients of group LASSO, LASSO, group SCAD, and SCAD penalized median regression from arbitrary one fit
Variable | LASSO(G) | LASSO(NG) | SCAD(G) | SCAD(NG) |
---|---|---|---|---|
intercept | 3798.4927 | 2207.0331 | 2243.9668 | 2591.5665 |
age | −145.8383 | 0.0000 | −3.2061 | 0.0000 |
age^{2} | 3.0818 | 0.2200 | 0.2841 | 0.1924 |
weight | 6.0441 | 4.9070 | 4.9070 | 2.1216 |
smoking | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
hyper | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
uterine | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
visit | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
race_{1} | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
race_{2} | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
pre_{1} | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
pre_{2} | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
LASSO = least absolute shrinkage and selection operator; SCAD = smoothly clipped absolute deviation.
Birthweight data: Mean of PMCE, pure PMCE (and its standard error in the parentheses) for group LASSO, group SCAD, LASSO, and SCAD penalized median regression and mean regression from 500 different splits of data
Median | Median | Median | Median | |
---|---|---|---|---|
LASSO(G) | LASSO(NG) | SCAD(G) | SCAD(NG) | |
PMCE | 292.74 | 297.71 | 296.16 | 298.39 |
Pure PMCE | 12.40 (0.86) | 17.37 (0.78) | 15.82 (0.81) | 18.05 (0.78) |
Mean | Mean | Mean | Mean | |
LASSO(G) | LASSO(NG) | SCAD(G) | SCAD(NG) | |
PMCE | 277.07 (0.79) | 277.22 (0.79) | 280.60 (0.82) | 279.83 (0.82) |
PMCE = predicted mean check error; LASSO = least absolute shrinkage and selection operator; SCAD = smoothly clipped absolute deviation.
Parkinson speech data: Mean of PMCE (and its standard error in the parentheses) for LASSO, group LASSO, SCAD, and group SCAD penalized regression at various quantiles from 100 different splits of data. Percentage of PMCE reduction by incorporating the group penalties are shown as % reduction
0.5 | 0.6 | 0.7 | 0.8 | 0.9 | |
---|---|---|---|---|---|
SCAD (NG) | 6.00 (0.04) | 6.71 (0.04) | 6.45 (0.04) | 5.46 (0.02) | 3.36 (0.01) |
SCAD(G) | 5.85 (0.04) | 6.57 (0.04) | 5.79 (0.03) | 5.03 (0.03) | 3.39 (0.01) |
% reduction | 2.53 | 1.88 | 10.20 | 7.97 | −1.00 |
LASSO (NG) | 5.93 (0.07) | 6.63 (0.08) | 6.39 (0.08) | 5.36 (0.08) | 3.30 (0.05) |
LASSO(G) | 5.78 (0.06) | 6.50 (0.08) | 5.74 (0.07) | 5.00 (0.08) | 3.34 (0.05) |
% reduction | 2.42 | 1.95 | 10.25 | 7.56 | −1.01 |
PMCE = predicted mean check error; LASSO = least absolute shrinkage and selection operator; SCAD = smoothly clipped absolute deviation.
Simulation results for Model I: Mean of MSE, TP rate, TN rate for LASSO, SCAD, MCP are from 200 simulated data sets with
LASSO(G) | LASSO(NG) | SCAD(G) | SCAD(NG) | MCP(G) | MCP(NG) | ||
---|---|---|---|---|---|---|---|
MSE | 873.18 (37.2) | 983.28 (30.2) | 1664.51 (62.2) | 1423.33 (39.1) | 1667.30 (59.3) | 1430.00 (37.9) | |
TP rate | 78.25 (1.0) | 68.58 (1.5) | 94.58 (1.2) | 60.08 (1.6) | 95.67 (1.0) | 60.67 (1.6) | |
TN rate | 42.00 (1.8) | 39.82 (1.5) | 64.71 (2.4) | 48.93 (1.6) | 64.39 (2.5) | 48.68 (1.5) | |
MSE | 403.28 (15.4) | 899.76 (26.1) | 549.90 (19.3) | 1170.84 (26.6) | 562.90 (20.2) | 1211.79 (28.3) | |
TP rate | 74.75 (1.1) | 62.92 (1.5) | 94.83 (1.0) | 45.08 (1.5) | 95.67 (0.9) | 45.67 (1.6) | |
TN rate | 54.43 (1.8) | 46.75 (1.6) | 72.96 (2.2) | 67.86 (1.6) | 71.68 (2.2) | 67.79 (1.7) | |
MSE | 299.36 (8.5) | 862.30 (24.1) | 305.80 (12.2) | 1087.98 (23.7) | 300.55 (11.7) | 1106.97 (25.9) | |
TP rate | 77.00 (1.1) | 57.25 (1.7) | 94.83 (0.9) | 39.25 (1.5) | 95.00 (1.1) | 38.33 (1.4) | |
TN rate | 55.64 (2.0) | 53.71 (1.7) | 76.32 (2.1) | 75.36 (1.6) | 76.79 (2.2) | 76.64 (1.5) | |
MSE | 254.80 (7.7) | 833.40 (25.0) | 217.07 (8.6) | 1092.55 (27.3) | 214.99 (8.4) | 1082.13 (25.4) | |
TP rate | 77.00 (1.4) | 55.25 (1.8) | 92.92 (1.2) | 35.58 (1.5) | 93.67 (1.1) | 36.00 (1.5) | |
TN rate | 58.86 (2.0) | 57.82 (1.9) | 81.00 (1.9) | 81.21 (1.5) | 80.96 (2.0) | 81.68 (1.5) | |
MSE | 243.28 (7.6) | 821.00 (26.7) | 209.41 (8.4) | 1086.88 (26.7) | 205.59 (8.6) | 1084.43 (26.1) | |
TP rate | 76.83 (1.5) | 52.58 (1.7) | 91.42 (1.2) | 33.92 (1.4) | 91.83 (1.2) | 33.50 (1.5) | |
TN rate | 60.29 (2.1) | 60.96 (1.9) | 77.11 (2.1) | 82.07 (1.3) | 77.79 (2.1) | 82.39 (1.4) |
Standard errors are in parenthesis. All values are multiplied by 10^{2}.
LASSO = least absolute shrinkage and selection operator; SCAD = smoothly clipped absolute deviation; MCP = minimax concave penalty; MSE = mean squared error; TP = true positive; TN = true negative.
Simulation results for Model II: Mean of MSE, TP rate, TN rate for LASSO, SCAD, MCP are from 200 simulated data sets with
LASSO(G) | LASSO(NG) | SCAD(G) | SCAD(NG) | MCP(G) | MCP(NG) | ||
---|---|---|---|---|---|---|---|
MSE | 892.65 (11.1) | 1361.60 (29.5) | 991.70 (19.9) | 1698.67 (38.8) | 982.53 (19.2) | 1706.56 (39.8) | |
TP rate | 59.06 (1.6) | 46.63 (1.7) | 62.19 (1.7) | 32.75 (1.5) | 61.94 (1.6) | 33.81 (1.6) | |
TN rate | 47.35 (1.7) | 48.06 (1.4) | 84.18 (1.3) | 65.12 (1.5) | 85.59 (1.2) | 64.53 (1.5) | |
MSE | 818.28 (11.6) | 1394.68 (30.0) | 844.13 (19.8) | 1645.89 (31.8) | 836.67 (14.6) | 1641.21 (30.8) | |
TP rate | 62.06 (1.4) | 53.44 (1.8) | 73.63 (1.2) | 35.38 (1.7) | 74.19 (1.2) | 33.94 (1.6) | |
TN rate | 52.21 (1.8) | 49.94 (1.7) | 65.32 (2.1) | 68.26 (1.7) | 64.12 (2.2) | 69.15 (1.7) | |
MSE | 757.97 (14.2) | 1364.15 (27.2) | 720.11 (14.6) | 1606.66 (28.3) | 714.26 (14.1) | 1614.34 (27.4) | |
TP rate | 67.63 (1.5) | 51.25 (1.6) | 76.94 (1.2) | 32.00 (1.5) | 77.56 (1.2) | 31.69 (1.5) | |
TN rate | 50.44 (1.8) | 51.71 (1.6) | 65.06 (2.1) | 70.68 (1.6) | 64.06 (2.1) | 71.18 (1.6) | |
MSE | 672.45 (15.8) | 1406.66 (27.9) | 671.74 (17.0) | 1627.18 (29.6) | 678.25 (16.5) | 1640.61 (29.8) | |
TP rate | 71.56 (1.4) | 52.31 (1.7) | 79.19 (1.2) | 32.81 (1.5) | 79.19 (1.1) | 31.56 (1.5) | |
TN rate | 49.06 (1.7) | 46.65 (1.6) | 60.94 (3.1) | 67.56 (1.6) | 61.26 (2.1) | 69.79 (1.5) | |
MSE | 621.12 (14.2) | 1414.99 (29.6) | 590.80 (17.1) | 1632.37 (28.9) | 587.34 (17.3) | 1633.92 (27.9) | |
TP rate | 72.50 (1.4) | 50.56 (1.8) | 81.50 (1.3) | 31.44 (1.6) | 82.00 (1.2) | 31.06 (1.6) | |
TN rate | 48.50 (1.6) | 47.24 (1.5) | 61.03 (2.0) | 67.68 (1.6) | 60.97 (2.0) | 67.65 (1.6) |
Standard errors are in parenthesis. All values are multiplied by 10^{2}.
LASSO = least absolute shrinkage and selection operator; SCAD = smoothly clipped absolute deviation; MCP = minimax concave penalty; MSE = mean squared error; TP = true positive; TN = true negative.
Simulation results for Model III. Mean of MSE, TP rate, TN rate for LASSO, SCAD, MCP are from 200 simulated data sets with
LASSO(G) | LASSO(NG) | SCAD(G) | SCAD(NG) | MCP(G) | MCP(NG) | ||
---|---|---|---|---|---|---|---|
MSE | 282.81 (8.9) | 818.73 (6.2) | 394.63 (11.5) | 825.53 (6.6) | 398.62 (11.6) | 825.67 (6.7) | |
TP rate | 73.58 (1.6) | 55.25 (1.8) | 46.17 (2.1) | 42.67 (1.6) | 46.58 (2.2) | 43.67 (1.6) | |
TN rate | 36.83 (1.5) | 52.42 (1.8) | 59.54 (2.4) | 65.71 (1.6) | 58.54 (2.4) | 64.42 (1.6) | |
MSE | 202.82 (6.8) | 772.30 (5.9) | 202.72 (8.2) | 768.74 (5.3) | 198.41 (7.9) | 770.85 (5.5) | |
TP rate | 86.17 (1.3) | 52.67 (1.9) | 81.58 (1.8) | 42.75 (1.7) | 81.25 (1.8) | 40.67 (1.6) | |
TN rate | 31.04 (1.7) | 57.75 (1.9) | 43.71 (2.2) | 69.13 (1.8) | 45.08 (2.2) | 70.42 (1.7) | |
MSE | 160.78 (7.0) | 747.47 (7.2) | 168.09 (9.4) | 740.63 (7.5) | 168.95 (9.6) | 739.28 (7.3) | |
TP rate | 91.67 (1.0) | 51.17 (1.8) | 85.58 (1.6) | 41.00 (1.7) | 85.25 (1.7) | 41.58 (1.7) | |
TN rate | 28.92 (1.6) | 57.17 (1.9) | 44.67 (2.1) | 68.25 (1.8) | 43.75 (2.0) | 68.42 (1.8) | |
MSE | 139.23 (5.9) | 726.39 (6.8) | 158.78 (9.7) | 714.23 (6.6) | 151.52 (9.1) | 711.59 (6.2) | |
TP rate | 93.67 (0.8) | 49.75 (1.7) | 87.75 (1.6) | 38.92 (1.5) | 88.75 (1.5) | 39.00 (1.5) | |
TN rate | 27.96 (1.6) | 54.88 (1.7) | 41.33 (2.2) | 68.17 (1.6) | 41.33 (2.1) | 69.00 (1.5) | |
MSE | 138.13 (6.3) | 719.82 (8.1) | 153.13 (9.6) | 715.73 (8.5) | 157.20 (10.0) | 714.26 (9.0) | |
TP rate | 94.17 (0.8) | 51.33 (1.7) | 89.75 (1.4) | 40.83 (1.8) | 89.33 (1.5) | 40.42 (1.8) | |
TN rate | 28.75 (1.5) | 52.71 (1.6) | 41.50 (2.1) | 65.38 (1.6) | 40.83 (2.2) | 64.71 (1.7) |
Standard errors are in parenthesis. All values are multiplied by 10^{2}.
LASSO = least absolute shrinkage and selection operator; SCAD = smoothly clipped absolute deviation; MCP = minimax concave penalty; MSE = mean squared error; TP = true positive; TN = true negative.
Simulation results for Model IV. Mean of MSE, TP rate, TN rate for LASSO, SCAD, MCP are from 200 simulated data sets with
LASSO(G) | LASSO(NG) | SCAD(G) | SCAD(NG) | MCP(G) | MCP(NG) | ||
---|---|---|---|---|---|---|---|
MSE | 974.87 (35.7) | 4382.14 (48.4) | 1089.52 (61.4) | 4378.63 (46.9) | 1087.99 (61.1) | 4348.43 (43.8) | |
TP rate | 90.44 (0.9) | 64.75 (2.2) | 97.81 (0.6) | 51.88 (2.4) | 97.81 (0.6) | 51.00 (2.4) | |
TN rate | 23.25 (1.0) | 35.25 (2.2) | 8.38 (1.1) | 48.13 (2.4) | 8.00 (1.1) | 49.00 (2.4) | |
MSE | 761.39 (33.7) | 4016.89 (42.1) | 977.69 (44.2) | 4143.90 (47.3) | 994.03 (43.3) | 4151.36 (46.2) | |
TP rate | 93.38 (0.8) | 71.75 (1.9) | 83.44 (1.7) | 54.38 (2.5) | 82.69 (1.7) | 55.13 (2.5) | |
TN rate | 25.63 (1.9) | 28.25 (1.9) | 29.00 (2.2) | 45.63 (2.5) | 28.63 (2.2) | 44.88 (2.5) | |
MSE | 642.29 (29.1) | 3853.86 (51.1) | 946.14 (42.8) | 3963.01 (54.7) | 929.65 (42.5) | 4026.50 (58.1) | |
TP rate | 93.94 (0.7) | 75.25 (1.7) | 80.75 (1.7) | 65.13 (2.0) | 81.75 (1.7) | 63.50 (2.0) | |
TN rate | 25.00 (1.8) | 24.75 (1.7) | 38.50 (2.6) | 34.88 (2.0) | 37.38 (2.6) | 36.50 (2.0) | |
MSE | 641.29 (30.8) | 3601.66 (51.0) | 941.30 (44.1) | 3751.45 (56.8) | 975.85 (44.2) | 3759.88 (57.1) | |
TP rate | 92.00 (0.9) | 74.25 (1.7) | 81.56 (1.7) | 62.88 (1.9) | 80.44 (1.7) | 62.13 (1.9) | |
TN rate | 29.00 (2.0) | 25.75 (1.7) | 41.25 (2.7) | 37.13 (1.9) | 40.75 (2.6) | 37.88 (1.9) | |
MSE | 702.93 (34.0) | 3445.67 (49.4) | 1043.48 (47.6) | 3571.69 (52.0) | 1056.95 (47.4) | 3599.84 (51.9) | |
TP rate | 90.06 (0.9) | 78.63 (1.5) | 78.31 (1.7) | 57.13 (2.0) | 78.19 (1.7) | 57.50 (2.0) | |
TN rate | 28.63 (1.9) | 21.38 (1.9) | 45.63 (2.9) | 42.88 (2.0) | 43.63 (2.8) | 42.50 (2.0) |
Standard errors are in parenthesis. All values are multiplied by 10^{2}.
LASSO = least absolute shrinkage and selection operator; SCAD = smoothly clipped absolute deviation; MCP = minimax concave penalty; MSE = mean squared error; TP = true positive; TN = true negative.
Mean of MSE, TP rate, and TN rate for LASSO and Group LASSO (and its standard error in the parentheses).
MSE | TP rate | TN rate | ||
---|---|---|---|---|
LASSO(NG) | 30.78 (1.35) | 99.58 (0.18) | 55.36 (1.61) | |
LASSO(G) | 26.02 (1.04) | 100.00 (0.00) | 57.29 (1.42) | |
LASSO(NG) | 30.28 (1.21) | 99.50 (0.20) | 56.36 (1.53) | |
LASSO(G) | 26.39 (1.06) | 100.00 (0.00) | 57.82 (1.53) | |
LASSO(NG) | 30.26 (1.28) | 6.71 (0.20) | 55.71 (1.58) | |
LASSO(G) | 26.46 (1.04) | 100.00 (0.00) | 56.39 (1.50) | |
LASSO(NG) | 30.44 (1.27) | 6.71 (0.18) | 55.11 (1.56) | |
LASSO(G) | 26.27 (1.04) | 100.00 (0.00) | 58.21 (1.52) | |
LASSO(NG) | 30.52 (1.27) | 6.71 (0.18) | 56.46 (1.61) | |
LASSO(G) | 26.57 (1.04) | 100.00 (0.00) | 57.11 (1.50) |
Errors follows
LASSO = least absolute shrinkage and selection operator; MSE = mean squared error; TP = true positive; TN = true negative.