A common problem in nonparametric regression is to estimate the unknown regression function based on the sample. Various methods have been suggested in this area including smoothing spline, regression splines, local polynomial estimators, projection estimators, and so on. One may refer to, for example,
Traditional non-adaptive estimators often suffer from the lack of flexibility when the underlying regression function poses complicated nonlinear local trend. A variety of spatially adaptive estimators has been proposed to resolve this issue.
For the last decade, the artificial neural network has flourished in the machine learning area. The flexibility provided in the layer architectures has proven to be successful in estimating highly non-linear structure from data, see
To obtain the adaptive property, a choice of an activation function in the neural network structure may be an useful tool. Popular activation functions include the sigmoidal function and the rectified linear unit (
In this paper, we develop an adaptive regression estimator based on the single-layer neural network structure. We adopt a symmetric activation function as units of our model. It has a localizing property and a small support, leading to improvement of estimation of its coefficients. The ℓ_{1}-penalty is imposed to induce sparsity in the nodes of the network, which results in spatial adaptation of the estimator. An initialization suitable for a type of activation functions is necessary to enhance the learning ability of the structure, refer to
The rest of this paper is organized as follows. In Section 2, we define the estimator using a symmetric activation and ℓ_{1}-penalization and explore the characteristics of the activation function. The implementation of our estimator is described in Section 3, followed by numerical study via simulations and motorcycle data analysis in Section 4. The conclusion of the parer is summarized in Section 5.
Consider the given data
where
where
which is also mentioned by
We use the empirical risk in terms of
For our estimator to attain the adaptive property, we penalize the coefficients of activation functions via ℓ_{1}-norm. It regularizes the smoothness of the estimator, controlling the number of activation functions. The penalized objective function to be minimized is
where
Let
The activation-based penalized regression estimator (APR) is given as
The activation function used as components of our model is parametrized as
We also note that the activation function of the form
In implementation, we use a coordinate descent algorithm which is comparatively easy to deal with in the sense that a univariate function problem is given. The first-order Taylor approximation is applied for changing a nonlinear problem into a linear problem. In the structure based on neural network, initial values has a critical effect on the stability and performance of estimator. We suggest a rule for generating initial values suited for B-spline type activation function. For model selection, a strategy for generating an increasing sequence of the complexity parameter
We apply coordinate descent algorithm to acquire our estimator, which is useful to effectively optimize a real-valued continuous function. Define univariate functions
where
The minimization problem of (
Note that there are some non-differentiable points in the B-splines type activation function. For numerical stability, we treat the differential values of these points as 1 or −1. Introducing the concept of the pseudoresponses considered by
where
Then, we observe that minimizing (
We note that the solution of minimizing (
The algorithm iterates in the order of
Initialization procedure becomes a key ingredient for gaining the desirable results in neural network structure.
Set
Calculate
Calculate
Such as scheme of spline methodologies where the interior knots are located on interval of design points, the center of activation functions is placed on the interval. As results, there is no activation functions which are inactive. Similarly,
We consider an increasing sequence
For selecting the optimal tuning parameter
where
In algorithm, we execute a pruning step through which the
We consider three example functions to identify the performance of the proposed estimator via simulations with 100 repetitions. Response variables are generated through the model (
Example functions and dataset are displayed in
We compare the performance of the proposed estimator with those of penalized regression B-spline estimator (PBSE) by
To identify the discrepancy between a function
and
where the measures are calculated at 1,000 equidistant points
We apply the proposed method to motorcycle dataset (
We use the same set-up with the simulation for all estimators. We just round off
In this paper, we have developed the penalized regression estimator based on the single-layer neural network structure combined with a symmetric activation function.
The proposed estimator attains the adaptive property, with a symmetric activation function and ℓ_{1}-penalization which leads to a data-driven activation function selection. Use of the proposed strategy of initialization results in the stability of estimation. An approximation upper bound of complexity parameter is induced, for which the proposed estimator is a linear function or zero function. We also have devised an efficient coordinate descent algorithm for the proposed estimator. The results of based on simulations and a real data analysis demonstrate the satisfactory performance of the proposed estimator.
In the future research, a variety of extensions of the model is expected due to the flexibility of the structure. We can consider introducing various activation functions to our framework, for example, one with a higher degree. An extension to multivariate regression also suggests a promising possibility, especially with tensor product structure. A model with tensor product structure is currently under development, with taking an additional penalization for variable selection into consideration.
The simulation results of the first example function
MSE | MAE | MXDV | ||
---|---|---|---|---|
APR | 0.0088(0.0041) | 0.0673(0.0175) | 0.2990(0.1196) | |
PBSE | 0.0113(0.0041) | 0.0784(0.0141) | 0.3143(0.0806) | |
freelsgen | 0.0106(0.0040) | 0.0746(0.0140) | 0.3677(0.1668) | |
freepsgen | 0.0102(0.0037) | 0.0734(0.0140) | 0.3526(0.1526) | |
APR | 0.0047(0.0023) | 0.0498(0.0123) | 0.2193(0.0825) | |
PBSE | 0.0068(0.0027) | 0.0605(0.0121) | 0.2471(0.0646) | |
freelsgen | 0.0063(0.0022) | 0.0568(0.0108) | 0.3123(0.1233) | |
freepsgen | 0.0060(0.0020) | 0.0560(0.0110) | 0.2824(0.1082) | |
APR | 0.0026(0.0009) | 0.0372(0.0068) | 0.1678(0.0654) | |
PBSE | 0.0040(0.0018) | 0.0456(0.0099) | 0.2021(0.0431) | |
freelsgen | 0.0038(0.0016) | 0.0431(0.0083) | 0.2780(0.1399) | |
freepsgen | 0.0033(0.0014) | 0.0416(0.0080) | 0.2374(0.1065) |
The standard error multiplied by 10 is reported in parenthesis.
The simulation results of the second example function
MSE | MAE | MXDV | ||
---|---|---|---|---|
APR | 0.0058(0.0037) | 0.0505(0.0167) | 0.2704(0.1108) | |
PBSE | 0.0103(0.0045) | 0.0695(0.0133) | 0.3301(0.0926) | |
freelsgen | 0.0074(0.0040) | 0.0579(0.0154) | 0.3321(0.1274) | |
freepsgen | 0.0072(0.0039) | 0.0575(0.0151) | 0.3233(0.1215) | |
APR | 0.0033(0.0016) | 0.0395(0.0110) | 0.2067(0.0571) | |
PBSE | 0.0058(0.0030) | 0.0519(0.0119) | 0.2441(0.0759) | |
freelsgen | 0.0043(0.0020) | 0.0448(0.0110) | 0.2881(0.1714) | |
freepsgen | 0.0041(0.0018) | 0.0441(0.0106) | 0.2562(0.0782) | |
APR | 0.0018(0.0009) | 0.0298(0.0086) | 0.1591(0.0591) | |
PBSE | 0.0029(0.0013) | 0.0376(0.0080) | 0.1858(0.0486) | |
freelsgen | 0.0027(0.0011) | 0.0352(0.0077) | 0.2573(0.1605) | |
freepsgen | 0.0026(0.0010) | 0.0349(0.0075) | 0.2314(0.1101) |
The standard error multiplied by 10 is reported in parenthesis.
The simulation results of the third example function
MSE | MAE | MXDV | ||
---|---|---|---|---|
APR | 0.0032(0.0011) | 0.0426(0.0073) | 0.1830(0.0502) | |
PBSE | 0.0034(0.0015) | 0.0431(0.0081) | 0.1860(0.0492) | |
freelsgen | 0.0039(0.0014) | 0.0475(0.0065) | 0.2286(0.0840) | |
freepsgen | 0.0037(0.0013) | 0.0468(0.0065) | 0.2113(0.0579) | |
APR | 0.0017(0.0006) | 0.0317(0.0050) | 0.1369(0.0385) | |
PBSE | 0.0018(0.0006) | 0.0321(0.0053) | 0.1456(0.0331) | |
freelsgen | 0.0024(0.0006) | 0.0378(0.0043) | 0.1862(0.0803) | |
freepsgen | 0.0022(0.0005) | 0.0371(0.0042) | 0.1750(0.0613) | |
APR | 0.0011(0.0003) | 0.0256(0.0037) | 0.1117(0.0259) | |
PBSE | 0.0012(0.0004) | 0.0257(0.0038) | 0.1354(0.0280) | |
freelsgen | 0.0015(0.0003) | 0.0294(0.0031) | 0.1592(0.0383) | |
freepsgen | 0.0014(0.0003) | 0.0289(0.0033) | 0.1491(0.0302) |
The standard error multiplied by 10 is reported in parenthesis.