
With the advent of the big data era, the amount and variety of data are increasing rapidly, making it difficult to estimate the true function. When trying to determine which variables influence a specific value of interest and to what extent, one of the representative statistical methods is the Lasso (least absolute shrinkage and selection operator) regression model (Tibshirani, 1996). This model adds a penalty term to the residual sum of squares in a traditional regression model, facilitating variable selection. In a truncated power spline regression model, if an
Much research has been conducted on spline regression models for knot selection (Osborne
Among the various methods to address Lasso regression problems, the alternating direction method of multipliers (ADMM) (Boyd
The ADMM algorithm and CDA are directly implemented using the statistical program R, while the iterative algorithm for QP is coded using the solve.QP function in the quadprog package of Weingessel (2011). Additionally, the study utilizes the glmnet function from the R package glmnet, which employs CDA, altering various simulation conditions under the same settings for the four functions. We also examine the fit of truncated power spline regression with a Lasso penalty using two real datasets.
This paper is organized as follows: In Section 2, we present the Lasso regression splines model. The algorithms for comparison are introduced in Section 3, including the glmnet package, which uses the CDA method. In Sections 4 and 5, we validate the performance of our proposed model using simulations and real data. Finally, we summarize the conclusions of this study in Section 6. All numerical analyses were conducted using the statistical software R.
For
where {
where
where
Meanwhile, when given paired data
where
To estimate
where
For
Note that (
Model (
The empirical risk of (
The objective function to be minimized is given by
for
and then our goal is to estimate lasso regression model
In
In this study, we apply alternating direction method of multipliers (ADMM), coordinate descent algorithm (CDA), and quadratic programming (QP) to solve the problem (
The ADMM, which is a relatively newer methodology compared to CDA and QP methodologies, is an algorithm that is intended to blend the decomposability of dual ascent with the superior convergence properties of the method of multipliers. To apply ADMM, the optimization problem (
where ||·||2 is
and
where
Then, ADMM steps are as follows. For
where
Herein, we stop iteration when
where
For
Meanwhile, the convergence rate of the algorithm depends on the value of the penalty parameter
In our case, we set the following update steps.
where
The CDA optimizes the objective function with respect to a single coefficient at a time, iteratively cycling through all coefficients until convergence is reached.
For
For notation convenience, we define
Thus, the coordinate descent update is written in the following forms:
Finally,
It is noteworthy that
If the difference between the objective functions of the current iteration and the previous iteration is less than _, then the iteration stops.
The QP is one of the methods for solving mathematical optimization problems involving quadratic functions. Unlike the ADMM algorithm and CDA, it has the characteristic of being able to obtain a solution in single step without repeating the algorithm.
For
Then, we can represent that
Therefore, the optimization problem is equivalent to minimizing
subject to
in addition to 2 ×
Then,
and
where
Therefore, the minimization problem is equivalent to
subject to
Note that in the QP method, unlike the ADMM algorithm and CDA, when estimating the parameters
The first constraint corresponds to the Lasso penalty, and the second and third constraints correspond to non-negativity.
To solve the optimization problem (
and
Finally, the solution of
and the estimator is expressed as
We implement the algorithms using the R program. To efficiently select tuning parameters, find
where
Here, we specify the maximum number of iterations
where
Meanwhile, in QP,
Similarly, we determine the number of candidates
In this section, we analyze the performance using simulated examples based on model (
The stopping threshold value
We employ 100
To implement the four algorithms, the statistical program R is used. ADMM and CDA are directly coded, and glmnet used the glmnet function of the glmnet package. In R, glmnet is a package (Friedman
Four examples are given as follows:
Figure 1 represents true functions of Example 1–4, respectively. The top-left plot is for Example 1, which has a constant shape. The top-right plot is for Example 2, which has a continuous fragment linear shape. The bottom-left plot is for Example 3, which has a simple cubic shape using trigonometric function. The bottom-right plot is for Example 4, which has a complex cubic shape. Example 1 and Example 4 utilize functions of Donoho and Johnstone (1994).
To compare the performance of models, we consider the mean squared error (MSE), the mean absolute error (MAE), and the maximum deviation (MXDV) criterion as loss functions. These criteria measure the discrepancy between the true function
In addition, we use running time to compare the performance of the models. Comparing running times can help in selecting the most optimal algorithm for each set of different conditions.
In Example 1,
In Example 2,
In Example 3,
In Example 4,
Meanwhile, in Example 4, ADMM appears to estimate the actual function well. However, CDA, glmnet, and QP show that the fitting results are far from the actual function. This is because, despite using the same basis, ADMM can control the convergence rate with a penalty parameter. In this regard, Figure 6 shows a natural characteristic of using truncated power splines. Figure 6 is a plot expressing the basis created when degrees are 0, 1, 2, and 3, respectively. As the degree increases, the overlap between the basis increases in the plot at large values of support. For this reason, even in the process of estimating the solution, there is correlation between basis, which causes convergence to take a long time or causes problems in estimating the solution.
Note that Figure 7 shows the glmnet fitting results according to the two factors:
Conversely, if the penalty parameter is adjusted in ADMM, it is an attractive algorithm that converges faster than CDA. Additionally, for a more objective comparison between ADMM and CDA, we compare the MSE change per iteration for 100 repetitions. In the experiment, the same data as in Example 4 was used, and
In Figure 8, for each repeat
In this section, based on the simulation results, we analyze two real data sets. The first is penny, and the second is WWWusage. Both data sets are R built-in data sets.
We utilize the penny dataset included in the locfit package (Loader and Liaw, 2024). The penny dataset contains 90 observations, each containing two variables: year and thickness. The year variable is a discrete variable representing the year when the coin was minted, while the thickness variable is a continuous variable representing the thickness of the coin in millimeters. In Figure 9, the
The penny’s thickness is not significantly affected by time and shows little variation. The degree of the function for algorithm estimation is zero. As shown in Figure 9, the plot in the top-left has a general trend of increasing thickness from the mid-1940s to the early 1970s. After that, the thickness decreased to around 53(mm), then gradually increased again. The degree of the function for algorithm estimation is zero in fitting penny data. Using a degree of zero is driven by the simplicity of the dataset. It is more efficient for this simple data structure, minimizing overfitting and enhancing model robustness.
The four plots below in Figure 9, show that, except for glmnet, the other algorithms perform at a similarly moderate level. From the mid-1940s to the early 1970s, they fit the changes of the two large data clusters in a simple shape. From the mid-1970s to 1980s, they don’t capture the sharp decline in the data, instead fitting only the average decrease based on the changes in the data clusters. However, glmnet fits well with the general trend of increasing thickness from the mid-1940s to the early 1970s. Additionally, from the mid-1970s to 1980s, it also fits well for the gradual increase following the sharp decline.
We utilize the WWWusage dataset, which is included in the base R package. WWWusage is a time series dataset that tracks the number of users connected to the Internet over a specific period. The WWWusage dataset contains a single variable, usage, which represents the number of users connected to the Internet. The dataset consists of 100 observations, recorded at one-minute intervals. In Figure 10, the
As shown in Figure 10, the plot in the top-left has a gradual upward trend until the mid-30
We assumed a nonparametric regression equation to estimate the constant, linear, and cubic functions. To solve the problem (
For the simpler forms, the constant and linear functions, the fitting performance of the four algorithms was found to be similar. In terms of running time, glmnet perfomed the best. For the simple cubic function, differences in fitting performance were observed, with glmnet and QP not fitting as well as ADMM or CDA. However, in terms of running time, CDA took approximately 3,000 times longer than ADMM. For the complex cubic function, as the number of knots increased, ADMM performed relatively well in terms of both fitting and running time compared to the other algorithms.
Through the simulation, we found that the CDA algorithm has a slow convergence rate in the case of data with complex nonlinear shapes. It was determined that this was due to an issue of overlapping support that occurred when using a truncated power spline as the basis function. In conclusion, we suggest a direction for selecting appropriate algorithms under various conditions. We recommend using glmnet for simpler forms and ADMM for more complex forms.
The average of each criterion for over 100 runs of ADMM, CDA, glmnet and QP in Example 1 with the standard error in parentheses and a sample size
Evaluation | ADMM | CDA | glmnet | QP | ||
---|---|---|---|---|---|---|
200 | 0.01 | MSE | 1.646 ×10−1 | 1.647×10−1 | 1.656×10−1 | 1.684×10−1 |
(5.713 ×10−5) | (1.685 ×10−4) | (1.464 ×10−3) | (9.674 ×10−5) | |||
MAE | 1.849 ×10−1 | 1.871×10−1 | 1.930×10−1 | 2.079×10−1 | ||
(1.311 ×10−3) | (2.171 ×10−3) | (7.610 ×10−3) | (6.363 ×10−4) | |||
MXDV | 1.731 ×10−1 | 1.730×10−1 | 1.730×10−1 | 1.730×10−1 | ||
(4.305 ×10−3) | (4.242 ×10−3) | 4.179 ×10−3) | (4.239 ×10−3) | |||
Running time(s) | 41.013 | 430.218 | 5.313 | 6.920 | ||
0.20 | MSE | 1.742 ×10−1 | 1.757×10−1 | 1.734×10−1 | 1.778×10−1 | |
(4.029 ×10−3) | (4.074 ×10−3) | (4.106 ×10−3) | (5.727 ×10−3) | |||
MAE | 2.272 ×10−1 | 2.296×10−1 | 2.265×10−1 | 2.327×10−1 | ||
(9.954 ×10−3) | (1.025 ×10−2) | (1.055 ×10−2) | (1.248 ×10−2) | |||
MXDV | 1.762 ×10−1 | 1.759×10−1 | 1.758×10−1 | 1.759×10−1 | ||
(7.193 ×10−2) | (7.201 ×10−2) | (7.231 ×10−2) | (7.195 ×10−2) | |||
Running time(s) | 44.286 | 453.275 | 5.507 | 6.890 | ||
400 | 0.01 | MSE | 1.859 ×10−1 | 1.862×10−1 | 1.865×10−1 | 1.885×10−1 |
(2.590 ×10−6) | (4.502 ×10−4) | (7.865 ×10−4) | (6.532 ×10−5) | |||
MAE | 1.799 ×10−1 | 1.854×10−1 | 1.870×10−1 | 1.885×10−1 | ||
(3.255 ×10−4) | (4.151 ×10−3) | (6.257 ×10−3) | (4.331 ×10−4) | |||
MXDV | 1.983 | 1.985 | 1.986 | 1.984 | ||
(3.326 ×10−3) | (3.153 ×10−3) | (3.164 ×10−3) | (3.150 ×10−3) | |||
Running time(s) | 62.925 | 759.264 | 6.004 | 15.769 | ||
0.20 | MSE | 1.927 ×10−1 | 1.964×10−1 | 1.931×10−1 | 1.974×10−1 | |
(3.831 ×10−3) | (5.690×10−3) | (4.278×10−3) | (6.808×10−3) | |||
MAE | 2.167 ×10−1 | 2.240×10−1 | 2.183×10−1 | 2.259×10−1 | ||
(9.614×10−3) | (1.204×10−2) | (9.627×10−3) | (1.367×10−3) | |||
MXDV | 1.994 | 1.996 | 1.998 | 1.995 | ||
(6.398×10−2) | (6.306×10−2) | (6.287×10−2) | (6.308×10−2) | |||
Running time(s) | 67.794 | 1561.989 | 4.550 | 16.543 |
The unit of running time is seconds.
The average of each criterion for over 100 runs of ADMM, CDA, glmnet and QP in Example 2 with the standard error in parentheses and a sample size
Evaluation | ADMM | CDA | glmnet | QP | ||
---|---|---|---|---|---|---|
200 | 0.05 | MSE | 3.095 ×10−4 | 2.752×10−4 | 3.228×10−4 | 4.074×10−4 |
(6.852 ×10−5) | (7.852 ×10−5) | (9.008 ×10−5) | (1.127 ×10−4) | |||
MAE | 1.301 ×10−2 | 1.145×10−2 | 1.290×10−1 | 1.323×10−2 | ||
(1.734 ×10−3) | (2.119 ×10−3) | (2.144 ×10−3) | (2.227 ×10−3) | |||
MXDV | 6.460 ×10−2 | 6.887×10−2 | 6.843×10−2 | 8.219×10−2 | ||
(9.687 ×10−3) | (9.897 ×10−3) | (1.109 ×10−2) | (1.038 ×10−2) | |||
Running time(s) | 18.122 | 6060.121 | 5.838 | 3.109 | ||
0.15 | MSE | 1.273 ×10−3 | 1.421×10−3 | 1.699×10−3 | 1.657×10−3 | |
(5.050 ×10−4) | (5.390 ×10−4) | (6.636 ×10−4) | (7.597 ×10−4) | |||
MAE | 2.752 ×10−2 | 2.878×10−2 | 3.147×10−2 | 3.041×10−2 | ||
(6.168 ×10−3) | (6.303 ×10−3) | (6.773 ×10−3) | (7.557 ×10−3) | |||
MXDV | 9.988 ×10−2 | 1.087×10−1 | 1.211×10−1 | 1.202×10−1 | ||
(2.181 ×10−2) | (2.391 ×10−2) | (3.105 ×10−2) | (2.699 ×10−2) | |||
Running time(s) | 18.704 | 10118.288 | 7.377 | 3.132 | ||
400 | 0.05 | MSE | 1.233 ×10−4 | 1.017×10−4 | 1.884×10−4 | 2.346×10−4 |
(4.070 ×10−5) | (4.643 ×10−5) | (6.421 ×10−5) | (9.956 ×10−5) | |||
MAE | 8.155 ×10−3 | 6.983×10−3 | 9.644×10−3 | 9.976×10−3 | ||
(1.582 ×10−3) | (1.774 ×10−3) | (1.540 ×10−3) | (2.532 ×10−3) | |||
MXDV | 4.456×10−2 | 4.544×10−2 | 5.304×10−2 | 6.531×10−2 | ||
(7.139 ×10−3) | (7.926 ×10−3) | (8.794 ×10−3) | (8.542 ×10−3) | |||
Running time(s) | 38.511 | 11994.427 | 6.743 | 16.322 | ||
0.15 | MSE | 4.548 ×10−4 | 7.079×10−4 | 5.426×10−4 | 7.847×10−4 | |
(2.593 ×10−4) | (3.453 ×10−4) | (4.121 ×10−4) | (5.972 ×10−4) | |||
MAE | 1.563 ×10−2 | 1.984×10−2 | 1.627×10−2 | 2.031×10−2 | ||
(3.966 ×10−3) | (4.903 ×10−3) | (4.887 ×10−3) | (7.240 ×10−3) | |||
MXDV | 7.232×10−2 | 7.989×10−2 | 1.663×10−1 | 7.988×10−2 | ||
(2.003 ×10−2) | (2.058 ×10−2) | (3.422 ×10−2) | (2.245 ×10−2) | |||
Running time(s) | 41.176 | 21368.337 | 9.491 | 17.210 |
The unit of running time is seconds.
The average of each criterion for over 100 runs of ADMM, CDA, glmnet and QP in Example 3 with the standard error in parentheses and a sample size
Evaluation | ADMM | CDA | glmnet | QP | ||
---|---|---|---|---|---|---|
200 | 0.05 | MSE | 1.468 ×10−4 | 2.017 ×10−4 | 5.490 ×10−4 | 1.940 ×10−2 |
(8.651 ×10−5) | (9.837 ×10−5) | (1.415 ×10−4) | (1.799 ×10−4) | |||
MAE | 9.333 ×10−3 | 1.040 ×10−2 | 1.808 ×10−2 | 1.124 ×10−1 | ||
(2.686 ×10−3) | (2.536 ×10−3) | (2.457 ×10−3) | (1.426 ×10−3) | |||
MXDV | 3.470 ×10−2 | 5.451 ×10−2 | 8.501 ×10−2 | 5.505 ×10−1 | ||
(1.474 ×10−2) | (2.453 ×10−2) | (1.658 ×10−2) | (1.396 ×10−2) | |||
Running time(s) | 6.363 | 17692.403 | 3.575 | 1.904 | ||
0.20 | MSE | 2.954 ×10−3 | 2.042 ×10−3 | 2.943 ×10−3 | 2.302 ×10−2 | |
(5.149 ×10−3) | (1.178 ×10−3) | (1.274 ×10−3) | (2.676 ×10−3) | |||
MAE | 3.760 ×10−2 | 3.371 ×10−2 | 4.254 ×10−2 | 1.278 ×10−1 | ||
(2.362 ×10−2) | (9.784 ×10−3) | (9.692 ×10−3) | (1.110 ×10−2) | |||
MXDV | 1.478 ×10−1 | 1.481 ×10−1 | 1.711 ×10−1 | 4.770 ×10−1 | ||
(8.882 ×10−2) | (7.437 ×10−2) | (6.591 ×10−2) | (7.405 ×10−2) | |||
Running time(s) | 6.540 | 19046.376 | 3.600 | 1.797 | ||
400 | 0.05 | MSE | 6.297 ×10−5 | 1.053 ×10−4 | 5.704 ×10− | 1.157 ×10−2 |
(2.580 ×10−5) | (5.354 ×10−5) | (1.400 ×10−4) | (8.007 ×10−4) | |||
MAE | 6.125 ×10−3 | 7.558 ×10−3 | 1.801 ×10−2 | 8.339 ×10−2 | ||
(1.359 ×10−3) | (1.818 ×10−3) | (2.415 ×10−3) | (3.288 ×10−3) | |||
MXDV | 2.519 ×10−2 | 4.318 ×10−2 | 9.274 ×10−2 | 4.827 ×10−1 | ||
(9.013 ×10−3) | (1.837 ×10−2) | (1.468 ×10−2) | (1.733 ×10−2) | |||
Running time(s) | 14.471 | 45454.877 | 4.388 | 10.318 | ||
0.20 | MSE | 1.209 ×10−3 | 1.151 ×10−3 | 2.058 ×10−3 | 2.030 ×10−2 | |
(2.583 ×10−3) | (6.202 ×10−4) | (7.298 ×10−4) | (1.225 ×10−3) | |||
MAE | 2.480 ×10−2 | 2.539 ×10−2 | 3.579 ×10−2 | 1.181 ×10−2 | ||
(1.308 ×10−2) | (7.034 ×10−3) | (7.061 ×10−3) | (6.343 ×10−3) | |||
MXDV | 1.049 ×10−1 | 1.246 ×10−1 | 1.657 ×10−1 | 5.281 ×10−1 | ||
(6.113 ×10−2) | (5.668 ×10−2) | (6.375 ×10−2) | (5.102 ×10−2) | |||
Running time(s) | 14.287 | 47062.536 | 4.441 | 9.643 |
The unit of running time is seconds.
The average of each criterion for over 100 runs of ADMM, CDA, glmnet and QP in Example 4 with the standard error in parentheses and a sample size
Evaluation | ADMM | CDA | glmnet | QP | ||
---|---|---|---|---|---|---|
200 | 0.05 | MSE | 1.408 ×10−1 | 4.572 ×10−1 | 4.694 ×10−1 | 7.916 ×10−1 |
(1.605 ×10−4) | (1.452 ×10−3) | (9.089 ×10−3) | (1.308 ×10−3) | |||
MAE | 1.863 ×10−1 | 4.431 ×10−1 | 4.606 ×10−1 | 7.200 ×10−1 | ||
(1.559 ×10−3) | (8.955 ×10−4) | (6.040 ×10−3) | (1.402 ×10−3) | |||
MXDV | 1.588 | 1.997 | 1.957 | 1.958 | ||
(1.481 ×10−2) | (9.172 ×10−3) | (2.232 ×10−2) | (5.708 ×10−3) | |||
Running time(s) | 17.109 | 17562.529 | 49.381 | 2.590 | ||
0.40 | MSE | 1.582 ×10−1 | 4.919 ×10−1 | 4.905 ×10−1 | 8.071 ×10−1 | |
(5.170 ×10−3) | (3.887 ×10−2) | (1.671 ×10−2) | (2.075 ×10−2) | |||
MAE | 2.480 ×10−1 | 4.900 ×10−1 | 4.9046 ×10−1 | 7.313 ×10−1 | ||
(1.200 ×10−2) | (4.009 ×10−2) | (1.632 ×10−2) | (1.658 ×10−2) | |||
MXDV | 1.594 | 1.979 | 1.960 | 1.941 | ||
(9.123 ×10−2) | (7.078 ×10−2) | (7.410 ×10−2) | (4.811 ×10−2) | |||
Running time(s) | 17.135 | 17706.054 | 40.187 | 2.564 | ||
400 | 0.05 | MSE | 8.010 ×10−2 | 4.584 ×10−1 | 4.795 ×10−1 | 7.870 ×10−1 |
(6.071 ×10−5) | (3.130 ×10−3) | (2.340 ×10−2) | (7.067 ×10−3) | |||
MAE | 1.184 ×10−1 | 4.435 ×10−1 | 4.765 ×10−1 | 7.209 ×10−1 | ||
(1.367 ×10−3) | (2.097 ×10−3) | (1.610 ×10−2) | (6.422 ×10−3) | |||
MXDV | 1.456 | 2.006 | 1.931 | 1.961 | ||
(1.650 ×10−2) | (1.194 ×10−2) | (4.342 ×10−2) | (1.052 ×10−2) | |||
Running time(s) | 34.909 | 42723.804 | 65.845 | 11.769 | ||
0.40 | MSE | 9.635 ×10−2 | 5.616 ×10−1 | 5.141 ×10−1 | 7.875 ×10−1 | |
(3.849 ×10−3) | (5.499 ×10−2) | (2.463 ×10−2) | (6.964 ×10−3) | |||
MAE | 1.879 ×10−1 | 5.531 ×10−1 | 5.127 ×10−1 | 7.211 ×10−1 | ||
(1.123 ×10−2) | (5.936 ×10−2) | (2.102 ×10−2) | (8.248 ×10−3) | |||
MXDV | 1.533 | 1.966 | 1.940 | 1.964 | ||
(9.103 ×10−2) | (9.155 ×10−2) | (6.290 ×10−2) | (3.708 ×10−2) | |||
Running time(s) | 34.829 | 42749.774 | 64.719 | 11.504 |
The unit of running time is seconds.