TEXT SIZE

CrossRef (0)
A numerical study on group quantile regression models

Doyoen Kima, Yoonsuh Jung1,a

aDepartment of Statistics, Korea University, Korea
Correspondence to: 1Department of Statistics, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul 02841, South
Korea. E-mail: yoons77@korea.ac.kr
Received January 2, 2019; Revised May 20, 2019; Accepted June 14, 2019.
Abstract

Grouping structures in covariates are often ignored in regression models. Recent statistical developments considering grouping structure shows clear advantages; however, reflecting the grouping structure on the quantile regression model has been relatively rare in the literature. Treating the grouping structure is usually conducted by employing a group penalty. In this work, we explore the idea of group penalty to the quantile regression models. The grouping structure is assumed to be known, which is commonly true for some cases. For example, group of dummy variables transformed from one categorical variable can be regarded as one group of covariates. We examine the group quantile regression models via two real data analyses and simulation studies that reveal the beneficial performance of group quantile regression models to the non-group version methods if there exists grouping structures among variables.

Keywords : group penalty, penalized quantile regression, variable selection
1. Introduction

In some regression problems we are interested in various percentage points of the distribution rather than the overall average of the data. For these cases, Koenker and Bassett (1978) proposed a quantile regression model that estimates the conditional quantile function of the response variable and identifies the effect for each quantile separately. With the advantages of quantile regression, its usage is clearly increasing in various fields such as economics (Hendricks and Koenker, 1992; Koenker and Hallock, 2001), microarray studies (Wang and He, 2007), demand analysis and empirical finance area (Wei and He, 2006; Wei et al., 2006).

We focus on variable selection aspect of penalized quantile regression because variable selection is an important procedure in the modeling as many datasets include a large number of candidate predictors. Several methods were introduced under the regularization framework. Tibshirani R (1996) proposed the least absolute shrinkage and selection operator (LASSO) using l1-norm and adaptive LASSO and its oracle properties were shown by Zou (2006). Fan (1997) then proposed a smoothly clipped absolute deviation (SCAD) penalty function and Fan and Li (2001) demonstrated its oracle properties under a penalized likelihood. Later, Wu and Liu (2009) extended the oracle properties of the SCAD penalty to the penalized quantile regression. After that Zhang (2010) proposed the minimax concave penalty (MCP) and showed its oracle properties. However, Hoerl and Kennard (1970) proposed ridge regression that employs l2 type penalty. Zou and Hastie (2005) introduced an elastic net that combines l1 penalty of LASSO and l2 penalty of ridge.

For the case of variables showing grouping structure, it seems more appropriate to reflect a grouping structure for an accurate estimation and prediction. The most common example is the case when a categorical variable is converted to several dummy variables. A group of dummy variables from one original categorical variable can be thought as the same group. Another example of the grouped variables is an additive model with polynomial variables. Terms forming a polynomial such as x1, $x12,x13$ can be regarded as composing one group. Grouping can also be applied to the model which imposes prior scientific information. For example, genes having same biological pathway can be classified into the same group in genetic data analysis. When we analyze these type of data with group structure among the variables, we can apply certain penalization methods with group penalties that reflect the grouping information.

Group penalty can be applied when the grouping structure is unknown (thus must be estimated). In biological studies, genetic data usually have background scientific information. For example, genes with the same biological pathway are often located in a neighborhood forming a group. We need to estimate grouping structure of predictors by certain clustering methods for variables such as Chavent and Kuentz-Simonet (2012).

Several penalties considering grouping structure have been proposed. Group LASSO which uses the l2-norm of the coefficients within a group was proposed by Bakin (1999) and extended by Yuan and Lin (2006). Huang et al. (2012) then showed group SCAD and group MCP for covariates possessing a grouping structure to select important groups. In the context of the quantile regression models, Ciuperca (2019) proposed adaptive group LASSO that adaptive LASSO penalty and established the sparsity and asymptotic normality of their methods. Kato (2011) considered high dimensional sparse quantile regression models with group LASSO penalty and attained non-asymptotic error bound of the estimation error. However, Group LASSO penalty was investigated for the classification problem by Hashem et al. (2016).

The later part of this paper is organized as follows. Penalized quantile regression models with a group penalty are introduced in Section 2. A real data set is analyzed in Section 3. In Section 4, we compare the prediction performance of introduced methods under various simulation settings. The conclusion and discussions are given in Section 5.

2. Group quantile regression models

We consider the linear model with p predictors:

$Y=Xβτ+ϵ,$

where Y = (y1, . . . , yn)T is an n × 1 vector of response variable, X = (xi j), i = 1, . . . , n, j = 1, . . . , p is the design matrix of predictors and ε is a vector of independent random errors with mean 0. Then the τth conditional quantile function can be estimated by solving

$argminβτ∈ℝp∑i=1nρτ (yi-xiTβτ),$

where $xiT$ is the ith row of X, βτ = (βτ,1, . . . , βτ,p)T and ρ(u) = τuI(u < 0)u is the check loss function. To produce sparse solution, Koenker et al. (1994) and Koenker (2004) suggested penalized version of (2.1)

$argminβτ∈ℝp∑i=1nρτ (yi-xiTβτ)+λJ(βτ),$

where λ ≥ 0 is the tuning parameter and J(βτ) denotes the penalty.

Many penalties under mean regression exist for reflecting grouping information. In this study, we propose group penalties under the quantile regression models. Group LASSO penalty, group SCAD penalty, and group MCP under the quantile regression models will be newly defined.

First, group LASSO proposed by Yuan and Lin (2006) uses l2-norm of the coefficients within a group. So it is regarded as a mixture of l1 penalty and l2 penalty, but a bit different from the elastic net penalty by Zou and Hastie (2005). Figure 1 illustrates the concept that the coefficients composing the group only by itself are affected by the l1 penalty. Suppose there are two groups with three variables : β1 = (β11, β12)T and a scalar β2. Figure 1(a), (e), (i) show the contours of the penalties. Figure 1(a) refers to |β11|+|β12|+|β2| = 1, Figure 1(g) refers to $β112+β122=1$. In Figure 1(g), group LASSO selects both β11 and β12 at the same time. Consequently, in Figure 1(e), group LASSO then shows sparsity at the group level. From this process, the group LASSO selects only groups of variables, while not selecting variables within groups individually. Suppose that the predictor variables were divided into G different groups, then group LASSO penalty under quantile regression is defined as

$β^(gLASSO)=argminβτ∈ℝp∑i=1nρτ (yi-xi,gTβτ,g)+λ∑g=1GKg‖βτ,g‖2,$

where xi,g is the design matrix for the ith sample where its columns correspond to the predictors in group g and βτ,g is coefficient vector of group g and Kg is the number of covariates in group g. Here, Kg is used to adjust the size of groups.

Besides LASSO, other penalty functions can be applied. By applying SCAD penalty and MCP under quantile regression, they are defined as

$β^(gSCAD)=argminβτ∈ℝp∑i=1nρτ (yi-xi,gTβτ,g)+∑g=1Gp1λ,γ(‖βτ,g‖2),$$p1λ,γ(β)={λ∣β∣,∣β∣≤λ,2γλ∣β∣-β2-λ22(γ-1),λ<∣β∣≤γλ,λ2(γ+1)2,∣β∣>γλ.β^(gMCP)=argminβτ∈ℝp∑i=1nρτ (yi-xi,gTβτ,g)+∑g=1Gp2λ,γ(‖βτ,g‖2),p2λ,γ(β)={λ∣β∣-β22γ,∣β∣≤γλ,12γλ2,∣β∣>γλ.$

From the (2.4) and (2.5), penalty is preferentially applied to each variables in g groups by the group unit. The summation over g is then substituted so that we can decide which groups of variables to be selected. The form of SCAD and MCP penalties are similar; however, the group SCAD is reported to show less grouping than group MCP (Ogutu and Piepho, 2014). The implementation of finding the solutions for the group LASSO, SCAD, MCP is done via the R package rqPen. rqPen produces estimates using the QICD command.

The theoretical properties of the group SCAD and group MCP are rarely known under the conditional mean regression despite the well-known oracle properties of the SCAD and MCP under the linear models. Additional complexity such as the true number of group and whether the number of group is fixed or increasing seems to hinder the exploration of the theoretical grounding. Unfortunately, we do not provide some theoretical properties under the quantile regression with these penalties.

3. Application to real data sets

### 3.1. Birth weight data

The birth weight data set from Hosmer and Lemeshow (1989) is used for the real data analysis. The data were collected at Baystate Medical Center in Springfield, Massachusetts. The data has the birthweight: birth weights of n = 188 babies and eight predictors about the mother, three of them are continuous variables and the rest are categorical variables. The three continuous variables are

• age: age of mother (years)

• weight: weight of mother at last menstrual period (pounds)

• visit: number of physician visits during the first trimester (0, 1, 2, or 3 or more).

The five categorical variables are

• race: race of mother (white, black, or the others)

• smoking: smoking status during pregnancy (yes or no)

• pre: history of premature labor (0, 1, 2, or 3)

• hyper: history of hypertension (yes or no)

• uterine: presence of uterine irritability (yes or no).

The dummy variables from each categorical variable are treated as one group for the group quantile regression model. Race1 and race2 are created for white and others respectively so that black is baseline category for race. For pre, pre1 stands for history of premature labor once and pre2 stands for two or more than two times of premature labors. After the preliminary analysis, age2 is added to the model. Taking birthweight as a response variable, we fit quantile regression models at τ = 0.5 to find significant variables. Our model is,

$E(birthweight)=β0+β1age+β2age2+β3weight+β4smoking+β5hyper+β6uterine+β7visit+β8race1+β9race2+β10pre1+β11pre2.$

When a full model is fitted with LASSO, SCAD, group LASSO, group SCAD, we have the results for Table 1. From the results, group penalties select both age and age2 simultaneously whereas LASSO and SCAD only choose age.

Simulation studies are conducted to see which methods yield a better estimate. First, we randomly divide the data into two parts. The first part is training data with 100 samples, and the other 88 samples are used for test data. The penalty parameter λ is chosen from the training data by employing a 10-fold cross validation. After selecting λ value, we fit LASSO penalized quantile regression, SCAD penalized quantile regression, group LASSO penalized quantile regression, and group SCAD penalized quantile regression to the training data. We then obtain the estimates for 12 coefficients and predict 88 values of response variable in the test data. Finally, predicted mean check error (PMCE) from 88 predicted values is calculated to gauge the performance. PMCE is therefore obtained by calculating the mean check error only with the test data. PMCE is defined as follow with the observed yi and predicted ŷi over the test data with ntest = 88:

$PMCE=1ntest∑i=1ntest{τ∣yi-y^i∣I(yi>y^i)+(1-τ)∣yi-y^i∣I(yi≤y^i)}.$

We repeat this procedure 500 times to reduce the variation from splitting the data. Table 2 then summarizes the mean of PMCE and its standard error.

From the results, the group version of LASSO and SCAD show better performance that the non-group version of penalties for median regression.

Noting that there is a certain lower bound for PMCE, we observe that PMCE cannot be smaller than the mean check error using the whole data in theory. This is because PMCE is calculated from the test data only. For this reason, the mean check error using all samples is obtained and subtracted from PMCE. Subtracting the mean check error is similar to removing the variance of the error σ2 in the model as the mean check error is often used as $σ̂2$. It seems reasonable to use pure PMCE as we often remove σ2 in simulation studies for calculating PMCE. Now, this pure PMCE shows a 12.4% to 28.6% reduction by the group penalty methods. Therefore, we can clearly see some benefits of the proposed methods.

### 3.2. Parkinson speech data

This data set contains measurement of voice recordings from equal number of healthy people and people with Parkinsonism who suffer from speech impairments. The covariates are n = 1040 measurements from p = 26 types of sound recordings. The time-frequency based 26 covariates can be categorized into 6 groups where the groups are known. See Sakar et al. (2013) for the details of the data set. The response variable is the united Parkinson’s disease rating scale (UPDRS) having values from 1 (normal) to 55 (most severe). As the half of the measurements are from healthy people, about half of the response variable is equal to 1. For this reason, we do not fit the quantile regression with quantiles less than 0.5, but τ = 0.5, . . . , 0.9 are examined.

We compare the LASSO and SCAD penalized quantile regression models and their counterparts with group penalties. For this purpose, the 1,040 samples are randomly split into 2/3 of training data and other test data. Using the training data, we fit the considered models where the penalty parameters were sought by a 10-fold cross-validation. After choosing an ‘optimal’ penalty parameter, we fit the training data and then predict the values of the response variable in the test data. The prediction error is measured with the check error defined in (3.1). We repeat this procedure 100 times to reduce the variation that arises from the random split. Table 3 provides the mean of 100 PMCE value at each value of τ and shows a smaller PMCE from the group penalties. The methods with group penalties show the most reduction in the PMCE at τ = 0.7 and very slight increase when τ = 0.9. In general, we see the improvement by incorporating the group penalties.

4. Simulation studies

In this section, we compare the prediction performance of LASSO, SCAD, MCP and grouped version of LASSO, SCAD, MCP under the four simulation scenarios. In the first scenario, we fit an additive model of continuous variables through third-order polynomial. In the second scenario, we fit an additive model of both continuous and categorical variables also through a third-order polynomial. In the third we consider an additive model of only categorical variables. Finally, we fit an ANOVA model with all the two-way interactions. In all simulations, the errors ε are from the standard normal distribution. The penalty parameters are selected by 10-fold cross-validation where the check loss is used for both model construction and validation.

• Model I: Random variables Z1, . . . , Z16 and W are independently generated from a standard normal distribution. The covariates are then defined as $Xi=(Zi+W)/2$. The model is: $Y=12X33+12X32+12X3+16X63-12X62+13X6+ϵ.$

• Model II: X1, . . . , X20 are generated as in model I. Then X11, . . . , X20 are first generated according to a centered multivariate normal distribution with a covariance between Xi and Xj being $0.5|i−j|$. Then Xi is trichotomized as 0, 1, or 2 if it is smaller than Φ−1(1/3), larger than Φ−1(2/3) or in between. ε ~ N(0, 22). The model is: $Y=12X33+12X32+12X3+16X63-12X62+13X6+I(X11=0)+12I(X11=1)+ϵ.$

• Model III: Random variables X1, . . . , X15 are first simulated according to a centered multivariate normal distribution with covariance between Xi and Xj being $0.5|i−j|$. Then Xi is trichotomized as 0, 1, or 2 if it is smaller than Φ−1(1/3), larger than Φ−1(2/3) or in between. The model is: $Y=0.9I(X1=1)-0.6I(X1=0)+0.5I(X3=1)+0.25I(X3=0)+0.5I(X5=1)+0.5I(X5=0)+ϵ.$

• Model IV: 4 categorical factors X1, X2, X3 and X4 are generated as in model III. The model is: $Y=1.5I(X1=1)+I(X1=0)+1.5I(X2=1)+1I(X2=0)+0.5I(X1=1,X2=1)+0.75I(X1=1,X2=0)+I(X1=0,X2=1)+1.25I(X1=0,X2=0)+ϵ.$

We generate R = 200 datasets for each example and penalty parameter is chosen through a 10-fold cross validation under 100 penalty parameter values. To measure the performance of the methods, we use mean squared error (MSE) for estimation accuracy calculated as $MSE=(β^–β)TE(XTX)(β^–β)$ along with the true positive (TP) rate and the true negative (TN) rate for identification accuracy over R simulated data sets. From above 4 models, we compare group penalized quantile regression of LASSO, SCAD, MCP (LASSO(G), SCAD(G), MCP(G)) and penalized quantile regression not considering grouping structure (LASSO(NG), SCAD(NG), MCP(NG)). Sample size n = 100 and 200 are considered, but the results from n = 100 are reported only.

For implementation, we use rqPen for both the grouped version and non-grouped version of LASSO, SCAD, MCP. The results of the simulation for Model I are summarized in Table 4, Model II are in Table 5, Model III are in Table 6, and Model IV are in

From the results, we can notice that the group version of LASSO, SCAD, MCP perform better than the models not considering grouping structure in most cases.

Finally, we only investigate heavy tail distributions under the Model I. Now, the error distributions ε are generated from the t-distribution with 5 degrees of freedom. 200 Monte Carlo data sets with n=200 samples are generated and we summarize the results by the mean of the MSE, mean true positive rate (TP rate), and mean true negative rate (TN rate).

The results in Table 8 show that the group version of the regression models yield the lower MSE with higher true positive and true negative rates under heavy tail error distribution.

5. Concluding remarks

This work presents numerical studies for group penalized quantile regression with the group LASSO, group SCAD, and group MCP penalties for analyzing the data with a grouping structure. The application of group penalty to quantile regression seems reasonable and improves the fit. We also applied group penalties to high dimensional data; however, these penalties did not work well in R package rqPen. It also failed to capture the grouping structure and provided the same results as the non-group version despite being designed for high dimensional data. High dimension data is common in genetic areas; therefore, future studies on its implementation look very interesting.

Acknowledgements

Yoonsuh Jung’s work was partially supported by National Research Foundation of Korea Grant NRF-2017R1C1B5017431.

Figures
Fig. 1. l1 penalty, Group LASSO penalty, l2 penalty from .
TABLES

### Table 1

Birth weight data: Coefficients of group LASSO, LASSO, group SCAD, and SCAD penalized median regression from arbitrary one fit

intercept3798.49272207.03312243.96682591.5665
age−145.83830.0000−3.20610.0000
age23.08180.22000.28410.1924
weight6.04414.90704.90702.1216
smoking0.00000.00000.00000.0000
hyper0.00000.00000.00000.0000
uterine0.00000.00000.00000.0000
visit0.00000.00000.00000.0000
race10.00000.00000.00000.0000
race20.00000.00000.00000.0000
pre10.00000.00000.00000.0000
pre20.00000.00000.00000.0000

LASSO = least absolute shrinkage and selection operator; SCAD = smoothly clipped absolute deviation.

### Table 2

Birthweight data: Mean of PMCE, pure PMCE (and its standard error in the parentheses) for group LASSO, group SCAD, LASSO, and SCAD penalized median regression and mean regression from 500 different splits of data

MedianMedianMedianMedian
PMCE292.74297.71296.16298.39
Pure PMCE12.40
(0.86)
17.37
(0.78)
15.82
(0.81)
18.05
(0.78)

MeanMeanMeanMean

PMCE277.07
(0.79)
277.22
(0.79)
280.60
(0.82)
279.83
(0.82)

PMCE = predicted mean check error; LASSO = least absolute shrinkage and selection operator; SCAD = smoothly clipped absolute deviation.

### Table 3

Parkinson speech data: Mean of PMCE (and its standard error in the parentheses) for LASSO, group LASSO, SCAD, and group SCAD penalized regression at various quantiles from 100 different splits of data. Percentage of PMCE reduction by incorporating the group penalties are shown as % reduction

τ0.50.60.70.80.9
SCAD (NG)6.00 (0.04)6.71 (0.04)6.45 (0.04)5.46 (0.02)3.36 (0.01)
SCAD(G)5.85 (0.04)6.57 (0.04)5.79 (0.03)5.03 (0.03)3.39 (0.01)
% reduction2.531.8810.207.97−1.00

LASSO (NG)5.93 (0.07)6.63 (0.08)6.39 (0.08)5.36 (0.08)3.30 (0.05)
LASSO(G)5.78 (0.06)6.50 (0.08)5.74 (0.07)5.00 (0.08)3.34 (0.05)
% reduction2.421.9510.257.56−1.01

PMCE = predicted mean check error; LASSO = least absolute shrinkage and selection operator; SCAD = smoothly clipped absolute deviation.

### Table 4

Simulation results for Model I: Mean of MSE, TP rate, TN rate for LASSO, SCAD, MCP are from 200 simulated data sets with n = 100

τ = 0.1MSE873.18
(37.2)
983.28
(30.2)
1664.51
(62.2)
1423.33
(39.1)
1667.30
(59.3)
1430.00
(37.9)
TP rate78.25
(1.0)
68.58
(1.5)
94.58
(1.2)
60.08
(1.6)
95.67
(1.0)
60.67
(1.6)
TN rate42.00
(1.8)
39.82
(1.5)
64.71
(2.4)
48.93
(1.6)
64.39
(2.5)
48.68
(1.5)
τ = 0.2MSE403.28
(15.4)
899.76
(26.1)
549.90
(19.3)
1170.84
(26.6)
562.90
(20.2)
1211.79
(28.3)
TP rate74.75
(1.1)
62.92
(1.5)
94.83
(1.0)
45.08
(1.5)
95.67
(0.9)
45.67
(1.6)
TN rate54.43
(1.8)
46.75
(1.6)
72.96
(2.2)
67.86
(1.6)
71.68
(2.2)
67.79
(1.7)
τ = 0.3MSE299.36
(8.5)
862.30
(24.1)
305.80
(12.2)
1087.98
(23.7)
300.55
(11.7)
1106.97
(25.9)
TP rate77.00
(1.1)
57.25
(1.7)
94.83
(0.9)
39.25
(1.5)
95.00
(1.1)
38.33
(1.4)
TN rate55.64
(2.0)
53.71
(1.7)
76.32
(2.1)
75.36
(1.6)
76.79
(2.2)
76.64
(1.5)
τ = 0.4MSE254.80
(7.7)
833.40
(25.0)
217.07
(8.6)
1092.55
(27.3)
214.99
(8.4)
1082.13
(25.4)
TP rate77.00
(1.4)
55.25
(1.8)
92.92
(1.2)
35.58
(1.5)
93.67
(1.1)
36.00
(1.5)
TN rate58.86
(2.0)
57.82
(1.9)
81.00
(1.9)
81.21
(1.5)
80.96
(2.0)
81.68
(1.5)
τ = 0.5MSE243.28
(7.6)
821.00
(26.7)
209.41
(8.4)
1086.88
(26.7)
205.59
(8.6)
1084.43
(26.1)
TP rate76.83
(1.5)
52.58
(1.7)
91.42
(1.2)
33.92
(1.4)
91.83
(1.2)
33.50
(1.5)
TN rate60.29
(2.1)
60.96
(1.9)
77.11
(2.1)
82.07
(1.3)
77.79
(2.1)
82.39
(1.4)

Standard errors are in parenthesis. All values are multiplied by 102.

LASSO = least absolute shrinkage and selection operator; SCAD = smoothly clipped absolute deviation; MCP = minimax concave penalty; MSE = mean squared error; TP = true positive; TN = true negative.

### Table 5

Simulation results for Model II: Mean of MSE, TP rate, TN rate for LASSO, SCAD, MCP are from 200 simulated data sets with n = 100

τ = 0.1MSE892.65
(11.1)
1361.60
(29.5)
991.70
(19.9)
1698.67
(38.8)
982.53
(19.2)
1706.56
(39.8)
TP rate59.06
(1.6)
46.63
(1.7)
62.19
(1.7)
32.75
(1.5)
61.94
(1.6)
33.81
(1.6)
TN rate47.35
(1.7)
48.06
(1.4)
84.18
(1.3)
65.12
(1.5)
85.59
(1.2)
64.53
(1.5)
τ = 0.2MSE818.28
(11.6)
1394.68
(30.0)
844.13
(19.8)
1645.89
(31.8)
836.67
(14.6)
1641.21
(30.8)
TP rate62.06
(1.4)
53.44
(1.8)
73.63
(1.2)
35.38
(1.7)
74.19
(1.2)
33.94
(1.6)
TN rate52.21
(1.8)
49.94
(1.7)
65.32
(2.1)
68.26
(1.7)
64.12
(2.2)
69.15
(1.7)
τ = 0.3MSE757.97
(14.2)
1364.15
(27.2)
720.11
(14.6)
1606.66
(28.3)
714.26
(14.1)
1614.34
(27.4)
TP rate67.63
(1.5)
51.25
(1.6)
76.94
(1.2)
32.00
(1.5)
77.56
(1.2)
31.69
(1.5)
TN rate50.44
(1.8)
51.71
(1.6)
65.06
(2.1)
70.68
(1.6)
64.06
(2.1)
71.18
(1.6)
τ = 0.4MSE672.45
(15.8)
1406.66
(27.9)
671.74
(17.0)
1627.18
(29.6)
678.25
(16.5)
1640.61
(29.8)
TP rate71.56
(1.4)
52.31
(1.7)
79.19
(1.2)
32.81
(1.5)
79.19
(1.1)
31.56
(1.5)
TN rate49.06
(1.7)
46.65
(1.6)
60.94
(3.1)
67.56
(1.6)
61.26
(2.1)
69.79
(1.5)
τ = 0.5MSE621.12
(14.2)
1414.99
(29.6)
590.80
(17.1)
1632.37
(28.9)
587.34
(17.3)
1633.92
(27.9)
TP rate72.50
(1.4)
50.56
(1.8)
81.50
(1.3)
31.44
(1.6)
82.00
(1.2)
31.06
(1.6)
TN rate48.50
(1.6)
47.24
(1.5)
61.03
(2.0)
67.68
(1.6)
60.97
(2.0)
67.65
(1.6)

Standard errors are in parenthesis. All values are multiplied by 102.

LASSO = least absolute shrinkage and selection operator; SCAD = smoothly clipped absolute deviation; MCP = minimax concave penalty; MSE = mean squared error; TP = true positive; TN = true negative.

### Table 6

Simulation results for Model III. Mean of MSE, TP rate, TN rate for LASSO, SCAD, MCP are from 200 simulated data sets with n = 100

τ = 0.1MSE282.81
(8.9)
818.73
(6.2)
394.63
(11.5)
825.53
(6.6)
398.62
(11.6)
825.67
(6.7)
TP rate73.58
(1.6)
55.25
(1.8)
46.17
(2.1)
42.67
(1.6)
46.58
(2.2)
43.67
(1.6)
TN rate36.83
(1.5)
52.42
(1.8)
59.54
(2.4)
65.71
(1.6)
58.54
(2.4)
64.42
(1.6)
τ = 0.2MSE202.82
(6.8)
772.30
(5.9)
202.72
(8.2)
768.74
(5.3)
198.41
(7.9)
770.85
(5.5)
TP rate86.17
(1.3)
52.67
(1.9)
81.58
(1.8)
42.75
(1.7)
81.25
(1.8)
40.67
(1.6)
TN rate31.04
(1.7)
57.75
(1.9)
43.71
(2.2)
69.13
(1.8)
45.08
(2.2)
70.42
(1.7)
τ = 0.3MSE160.78
(7.0)
747.47
(7.2)
168.09
(9.4)
740.63
(7.5)
168.95
(9.6)
739.28
(7.3)
TP rate91.67
(1.0)
51.17
(1.8)
85.58
(1.6)
41.00
(1.7)
85.25
(1.7)
41.58
(1.7)
TN rate28.92
(1.6)
57.17
(1.9)
44.67
(2.1)
68.25
(1.8)
43.75
(2.0)
68.42
(1.8)
τ = 0.4MSE139.23
(5.9)
726.39
(6.8)
158.78
(9.7)
714.23
(6.6)
151.52
(9.1)
711.59
(6.2)
TP rate93.67
(0.8)
49.75
(1.7)
87.75
(1.6)
38.92
(1.5)
88.75
(1.5)
39.00
(1.5)
TN rate27.96
(1.6)
54.88
(1.7)
41.33
(2.2)
68.17
(1.6)
41.33
(2.1)
69.00
(1.5)
τ = 0.5MSE138.13
(6.3)
719.82
(8.1)
153.13
(9.6)
715.73
(8.5)
157.20
(10.0)
714.26
(9.0)
TP rate94.17
(0.8)
51.33
(1.7)
89.75
(1.4)
40.83
(1.8)
89.33
(1.5)
40.42
(1.8)
TN rate28.75
(1.5)
52.71
(1.6)
41.50
(2.1)
65.38
(1.6)
40.83
(2.2)
64.71
(1.7)

Standard errors are in parenthesis. All values are multiplied by 102.

LASSO = least absolute shrinkage and selection operator; SCAD = smoothly clipped absolute deviation; MCP = minimax concave penalty; MSE = mean squared error; TP = true positive; TN = true negative.

### Table 7

Simulation results for Model IV. Mean of MSE, TP rate, TN rate for LASSO, SCAD, MCP are from 200 simulated data sets with n = 100

τ = 0.1MSE974.87
(35.7)
4382.14
(48.4)
1089.52
(61.4)
4378.63
(46.9)
1087.99
(61.1)
4348.43
(43.8)
TP rate90.44
(0.9)
64.75
(2.2)
97.81
(0.6)
51.88
(2.4)
97.81
(0.6)
51.00
(2.4)
TN rate23.25
(1.0)
35.25
(2.2)
8.38
(1.1)
48.13
(2.4)
8.00
(1.1)
49.00
(2.4)
τ = 0.2MSE761.39
(33.7)
4016.89
(42.1)
977.69
(44.2)
4143.90
(47.3)
994.03
(43.3)
4151.36
(46.2)
TP rate93.38
(0.8)
71.75
(1.9)
83.44
(1.7)
54.38
(2.5)
82.69
(1.7)
55.13
(2.5)
TN rate25.63
(1.9)
28.25
(1.9)
29.00
(2.2)
45.63
(2.5)
28.63
(2.2)
44.88
(2.5)
τ = 0.3MSE642.29
(29.1)
3853.86
(51.1)
946.14
(42.8)
3963.01
(54.7)
929.65
(42.5)
4026.50
(58.1)
TP rate93.94
(0.7)
75.25
(1.7)
80.75
(1.7)
65.13
(2.0)
81.75
(1.7)
63.50
(2.0)
TN rate25.00
(1.8)
24.75
(1.7)
38.50
(2.6)
34.88
(2.0)
37.38
(2.6)
36.50
(2.0)
τ = 0.4MSE641.29
(30.8)
3601.66
(51.0)
941.30
(44.1)
3751.45
(56.8)
975.85
(44.2)
3759.88
(57.1)
TP rate92.00
(0.9)
74.25
(1.7)
81.56
(1.7)
62.88
(1.9)
80.44
(1.7)
62.13
(1.9)
TN rate29.00
(2.0)
25.75
(1.7)
41.25
(2.7)
37.13
(1.9)
40.75
(2.6)
37.88
(1.9)
τ = 0.5MSE702.93
(34.0)
3445.67
(49.4)
1043.48
(47.6)
3571.69
(52.0)
1056.95
(47.4)
3599.84
(51.9)
TP rate90.06
(0.9)
78.63
(1.5)
78.31
(1.7)
57.13
(2.0)
78.19
(1.7)
57.50
(2.0)
TN rate28.63
(1.9)
21.38
(1.9)
45.63
(2.9)
42.88
(2.0)
43.63
(2.8)
42.50
(2.0)

Standard errors are in parenthesis. All values are multiplied by 102.

LASSO = least absolute shrinkage and selection operator; SCAD = smoothly clipped absolute deviation; MCP = minimax concave penalty; MSE = mean squared error; TP = true positive; TN = true negative.

### Table 8

Mean of MSE, TP rate, and TN rate for LASSO and Group LASSO (and its standard error in the parentheses).

MSETP rateTN rate
τ = 0.1LASSO(NG)30.78 (1.35)99.58 (0.18)55.36 (1.61)
LASSO(G)26.02 (1.04)100.00 (0.00)57.29 (1.42)

τ = 0.2LASSO(NG)30.28 (1.21)99.50 (0.20)56.36 (1.53)
LASSO(G)26.39 (1.06)100.00 (0.00)57.82 (1.53)

τ = 0.3LASSO(NG)30.26 (1.28)6.71 (0.20)55.71 (1.58)
LASSO(G)26.46 (1.04)100.00 (0.00)56.39 (1.50)

τ = 0.4LASSO(NG)30.44 (1.27)6.71 (0.18)55.11 (1.56)
LASSO(G)26.27 (1.04)100.00 (0.00)58.21 (1.52)

τ = 0.5LASSO(NG)30.52 (1.27)6.71 (0.18)56.46 (1.61)
LASSO(G)26.57 (1.04)100.00 (0.00)57.11 (1.50)

Errors follows t-distribution with 5 degrees of freedom. All values are multiplied by 102. (NG) and (G) respectively stands for non-group and group version of LASSO.

LASSO = least absolute shrinkage and selection operator; MSE = mean squared error; TP = true positive; TN = true negative.

References
1. Bakin S (1999). Adaptive regression and model selection in data mining problems (PhD thesis) , The Australian National University.
2. Chavent M and Kuentz-Simonet V (2012). ClustOfVar: an R package for the clustering of variables. Journal of Statistical Software, 50, 1-16.
3. Ciuperca G (2019). Adaptive group LASSO selection in quantile models. Statistical Papers, 60, 173-197.
4. Fan J (1997). Comments on wavelets in statistics: a review by A. Antoniadis. Journal of the Italian Statistical Society, 6, 131-138.
5. Fan J and Li R (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348-1360.
6. Hashem H, Vinciotti V, Alhamzawi R, and Yu K (2016). Quantile regression with group lasso for classification. Advances in Data Analysis and Classification, 10, 375-390.
7. Hendricks W and Koenker R (1992). Hierarchical spline models for conditional quantiles and the demand for electricity. Journal of the American Statistical Association, 87, 58-68.
8. Hoerl AE and Kennard RW (1970). Ridge regression: biased estimation for nonorthogonal problems. Technometrics, 12, 55-67.
9. Hosmer DW and Lemeshow S (1989). Applied logistic regression, Wiley.
10. Huang J, Breheny P, and Ma S (2012). A Selective Review of Group Selection in High-Dimensional Models. Statistical Science, 27, 481-499.
11. Kato K (2011). Group Lasso for high dimensional sparse quantile regression models. arXiv:1103.1458 v2 [stat.ME]
12. Koenker R (2004). Quantile regression for longitudinal data. Journal of Multivariate Analysis, 91, 74-89.
13. Koenker R and Bassett G (1978). Regression quantiles. Econometrica, 46, 33-50.
14. Koenker R and Hallock KF (2001). Quantile regression. Journal of Economic Perspectives, 15, 143-156.
15. Koenker R, NGP , and Portnoy S (1994). Quantile smoothing splines. Biometrika, 81, 673-680.
16. Ogutu JO and Piepho HP (2014). Regularized group regression methods for genomic prediction: Bridge, MCP, SCAD, group bridge, group lasso, sparse group lasso, group MCP and group SCAD. BMC Proceedings, 8.
17. Sakar BE, Isenkul ME, Sakar CO, Sertbas A, Gurgen F, Delil S, Apaydin H, and Kursun O (2013). Collection and analysis of a Parkinson Speech Dataset with multiple types of sound recordings. IEEE Journal of Biomedical and Health Informatics, 17, 828-834.
18. Tibshirani R (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58, 267-288.
19. Wang H and He X (2007). Detecting differential expressions in GeneChip microarray studies: a quantile approach. Journal of the American Statistical Association, 102, 104-112.
20. Wei Y and He X (2006). Conditional growth charts. The Annals of Statistics, 34, 2069-2097.
21. Wei Y, Pere A, Koenker R, and He X (2006). Quantile regression methods for reference growth charts. Statistics in Medicine, 25, 1369-1382.
22. Wu Y and Liu Y (2009). Variable selection in quantile regression. Statistica Sinica, 19, 801-817.
23. Yuan M and Lin Y (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68, 49-67.
24. Zhang CH (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38, 894-942.
25. Zou H (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101, 1418-1429.
26. Zou H and Hastie T (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67, 301-320.