A nonparametric function estimation is a statistical method that aims to estimate a function based on observed data, assuming that the function belongs to an infinite dimensional parameter space. The function estimation methods that are suitable for various types of data are being studied. Renowned methods are the kernel density estimation, local polynomial and spline; see Fan and Gijbels (1996), Efromovich (2008), Green and Silverman (1993) and Tsybakov (2008).
A representative method used for the function estimation is the basis function method. Since it is impossible to estimate the infinite parameters using finite data in a function estimation, a function space and base functions are introduced. The estimated target function is represented by a linear combination of the basis functions that spans the appropriate function space. Once the function space and the basis functions are defined, the target function can be expressed by using its predictor variables. Thus, the basis function method allows us to consider the problem of estimating the target function as a result of estimating the regression coefficients. The basis function has the advantage of being able to apply statistical methodologies. These include the least squares, the absolute deviations, and the likelihood problems.
There is a spline basis out of many basis functions that are used for interpolating the data or fitting smooth curves. The spline basis function is defined as a differentiable piecewise polynomial for each given knot interval. The main basis techniques are the Bsplines and the truncated power basis splines (De Boor, 1978). The truncated power basis spline has the advantages of possessing a simple construction and easily interpreting the parameters in the model. However, the coefficients are correlated in the model, where there are many overlapping intervals. When the predictor is large, the curves of that basis become almost vertical and parallel. Therefore, an incorrect fitting occurs. In contrast, the Bspline basis is more complex than the truncated power basis. The reason for using the Bspline is mainly for computational problems. The Bspline has minimal support (or compact support) that minimizes the amount of the overlap between the spline basis. Therefore, it enables stable calculations (Yee, 2015).
In the basis function methodology, the objective function to be optimized is primarily a convex or concave function. Its domain is a coefficient vector of the bases constituting the estimated function. The coordinate descent algorithm (Wright, 2015) is simple, efficient and useful for optimizing these objective functions. The concept of the algorithm is to optimize the solution of the convex (concave) function minimization (maximization) problem for multidimensional vectors. A coefficient update holds the remaining coefficients as constants. It considers the objective function as a onedimensional function.
Knot selection in the regression spline (PSE) is a significant challenge. The model’s performance is highly influential depending on the location and the number of the knots. The knot selection is the same as the variable selection in a multiple regression. Many research have been conducted on knot selection. Osborn
The qualitative or categorical factors, such as gender and race, sometimes perform as predictors that are very useful in explaining a regression’s response variables. These qualitative factors are used as predictors in the form of indicators or dummy variables. The dummy variables have a variety of uses, and can always be used whenever the qualitative factors impact a regression relationship (Chatterjee and Hadi, 2015). Tibshirani and Friedman (2020) proposed an extended lasso by adding modifying variables. These include gender, age, and time to response variables and predictors. They allow for the possibility that some or all of the coefficients vary following each category.
In this paper, we present a new statistical learning theory for the modeling and analysis of data, This theory consists of the auxiliary variables besides the response and predictor variables. The proposed model was performed with only one fit, regardless of the number of categories. We allow the coefficients according to the categories of the auxiliary variables to have different values. We express the estimator with a linear combination of the main predictor and the auxiliary term of the Bspline. The coefficient is estimated by applying a coordinate descent algorithm to minimize the residual sum of squares. In regards of the knot selection to reduce the computational cost, an improved stepwise selection is introduced. We consider five types of simulation data to measure the performance of the proposed method and conduct a realdata analysis involving the auxiliary variables.
This paper is organized as follows. In Section 2, we define the Bspline regression estimators including its auxiliary variables. In Section 3, the process of updating the coefficients based on the coordinate descent algorithm and the specific implementation of the knot selection are explained. We validate the performance of our proposed model using the simulations and real data in Section 4. Section 5 summarizes the paper’s conclusions. An
Consider a nonparametric regression model
where
In this model, we have the measurements of one or more auxiliary binary variables
Let
where
For
where
We consider the following residual sum of squares objective function
and define
Thereafter, the proposed estimator, that we label
Since the objective function (
where
be the vectors with their initial values substituted for the
Thus, the algorithm iteratively and coordinately updates the coefficients by the minimum of a univariate objective function (
The solution to the optimization problem is to obtain a quadratic solution to the
where
is a partial residual. Thus, we update
Similarly,
where
Ignoring the terms that are independent of
In (
We apply the stepwise selection (Efroymson, 1960) that is a variable selection method used in a multiple regression analysis, for the knot selection. In the multiple regression analyses, when the number of variables is large, a fitting regression model can lead to overfitting. This results in a problem where the variance of the coefficient estimates increase. To solve this problem, a stepwise selection is applied to select an optimal subset from a set of predictors. In an RS, the knot selection results in the same challenge. If the number of the knots is insufficient, the regression model is underfitting, and if there are too many knots, overfitting occurs. Therefore, the knot selection is important as the number of the knots significantly affects the fitted model.
A stepwise selection begins with a model without knots, and then iteratively adds and deletes the knots to the model individually until the best model is found. The Bayesian information criterion (BIC) (Schwarz
where
The stepwise selection procedure is provided in Algorithm 1. First, we begin a null model
Algorithm 2 represents the implementation of the proposed algorithm. Given the observed data and some input parameters, first we initialize the coefficient to zero. Thereafter, we compute the initial residual sum of squares, that is
In this section, we illustrate the performance of the proposed method on the simulated examples. We generate predictors as sample size
The four examples are as follows: Examples 1–2 consider a piecewise constant function and a piecewise linear function to set up the exact location of the true knots. The true knots are 0.4, 0.7 in Example 1 and 0.2, 0.45, 0.75 in Example 2. Example 3 is designed as a nonlinear function that may appear in the real data. We evaluated the performance of the proposed model when the number of the auxiliary variable categories increased to four in Example 4. Figure 1 displays the underlying functions of Examples 1–4.
We compare the proposed method with the regression spline (RS) of Marsh and Cormier (2001), smoothing spline (SS) of Wahba (1990), local regression (LR) of Loader (2006), local polynomial (LP) of Fan and Gijbels (1996), and categorical regression splines (CRS) of Nie and Racine (2012). The RS is obtained by the linear fitting with the Bspline basis function without an auxiliary variable. The SS is a method of applying the smoothing and shrinkage techniques. The LR and LP are generalizations of the moving average and polynomial regression. CRS is computing nonparametric regression splines in the presence of both continuous and categorical predictors. RS, SS and LR are provided in the
We consider the mean squared error (MSE), mean absolute error (MAE) criterion for the discrepancy measure between the underlying function
The PSE was simulated by fixing the number of the initial knots at 50. In addition, the suitable order
Tables 1
As described in Tables 1–2, the PSE indicates a better performance than the other methods. Since the PSE can be fitted with the degree 0 or 1, estimators similar to the underlying function are obtained. The RS, that uses the same basis function as the PSE, can be fitted with the degrees 0 and 1. However, the performance is poor because the auxiliary variables are not included in the model. The SS, LR, LP and CRS are fitted with curves, therefore, it is not a good model in Examples 1–2. Since the PSE provided the initial knots as the quantile of the predictor variable, it chooses knots that is adjacent to the true knots. Accordingly, the PSE demonstrates a good performance in Examples 1–2.
In Tables 3–4, the PSE dose not delay the other nonlinear function estimation methods. The PSE has a much better performance than the LR and LP and a similar performance to the SS and CRS. Unlike the SS, that forms knots in all inputs and uses the shrinkage method, the PSE can estimate the nonlinear functions with fewer knots through a knot selection. Furthermore, the SS, LR, and LP require as many models as the number of levels of the auxiliary variables. However, the PSE is an efficient method since it can fit as one model regardless of the number of categories.
We consider Example 5 with two auxiliary variables, unlike the examples above. These are set as variables that determine shape and amplitude of the underlying function. The shape variable has two levels, cosine and
Table 5 is the experimental results of Example 5 in the same scenario as Table 4. When the sample size
The BMD’s data was obtained in Bachrach
In the proposed method, we apply the predictors as age, and the responses as rspnbmd. The two variables are used as the auxiliary variables. In one case, a sex variable with two categories (male and female) and in the second case an ethnicity variable with four categories (Asian, Black, Hispanic, White) are applied. Finally, we contemplate a model that contains both variables being eight categories. We consider cubic splines in all cases with order
Figure 3 displays the scatterplots when the auxiliary variable is “sex” and “ethnic”, respectively. On the left panel, the data are divided into males and females, it is indicated that females aged 10 to 15 years are growing at an age when bone density levels are relatively faster than that of males. The right panel is divided by race, which is difficult to distinguish.
Figure 4 indicates the shape of the estimator when the auxiliary variables are used. When the auxiliary variable is “sex” (left), it can be stated that in the adolescent phase (between the ages of 10 and 13 years), females tend to increase their BMD relatively earlier than that of their male counterparts. When the auxiliary variable is “race” (right), it displays that Asians have a relatively higher BMD in their adolescent phase (between the ages of 10 and 13 years) as compared to that of other races. White and Hispanic adolescents demonstrate similar trends. Figure 5 is an estimator by the combined sex and ethnicity. The Male & Black curve is flattened, with no soaring sections as compared to the other categories. Additionally, the Female & Asian curve demonstrates that Asian women have a higher BMD at a young age than other races.
In this article, we have developed a nonparametric regression function estimation method using the Bsplines if there are auxiliary variables, in addition to the predictor variable. We devised a coordinatewise update scheme to efficiently optimize the objective function. It was confirmed that the optimal knots of the spline function were adaptively selected by the proposed stepwise algorithm. The performance of the proposed estimator has been depicted with the simulated and real data analysis.
The results of this paper are expected to provide a foundation for further studies. They can be generalized and extended in several mannerisms.
First, one may consider the nonparametric quantile regression function estimator. Switching from the sum of the residuals squared objective function to the absolute deviation loss function allows the coordinatedescentbased algorithms to obtain a specified
Second, it is expected that the regularization method can be applied with the addition of an appropriate penalty term for the knot selection. As a suitable penalty term, there is the total variation of the (
The average of each criterion (×100) over 100 runs of OUR, RS, SS, LR, LP and CRS for Example 1 with a sample size
SNR  Method  MSE  MAE  SNR  Method  MSE  MAE  

200  5  PSE  500  5  PSE  
RS  9.14(6e04)  25.48(6e04)  RS  8.29(4e04)  25.19(3e04)  
SS  4.09(8e04)  14.10(2e03)  SS  2.44(3e04)  10.60(9e04)  
LR  14.16(1e03)  26.32(2e03)  LR  13.91(8e04)  25.77(9e04)  
LP  10.78(5e03)  17.61(3e03)  LP  5.99(3e03)  12.17(2e03)  
CRS  7.11(2e03)  18.47(3e03)  CRS  4.55(7e04)  13.66(1e03)  
15  PSE  15  PSE  1.73(5e04)  
RS  8.46(5e04)  25.18(6e04)  RS  8.10(3e04)  25.20(3e04)  
SS  2.12(5e04)  9.63(1e03)  SS  7.57(5e04)  
LR  13.63(1e03)  25.45(1e03)  LR  13.84(8e04)  25.59(7e04)  
LP  11.13(6e03)  14.77(3e03)  LP  7.37(5e03)  10.56(3e03)  
CRS  5.24(2e03)  14.59(3e03)  CRS  4.10(5e04)  12.00(9e04) 
The average of each criterion (×100) over 100 runs of OUR, RS, SS, LR, LP and CRS for Example 2 with a sample size
SNR  Method  MSE  MAE  SNR  Method  MSE  MAE  

200  5  PSE  500  5  PSE  
RS  1.03(3e05)  8.77(2e04)  RS  1.03(1e05)  8.87(8e05)  
SS  1.72(3e04)  SS  1.14(2e04)  
LR  0.10(2e05)  2.38(2e04)  LR  0.09(9e06)  2.22(1e04)  
LP  0.13(8e05)  2.58(6e04)  LP  0.05(3e05)  1.67(3e04)  
CRS  0.06(2e05)  1.90(3e04)  CRS  0.03(7e06)  1.28(2e04)  
15  PSE  15  PSE  
RS  1.00(3e05)  8.70(2e04)  RS  1.02(1e05)  8.86(7e05)  
SS  1.07(2e04)  SS  0.75(1e04)  
LR  0.08(1e05)  2.15(2e04)  LR  0.08(6e06)  2.13(1e04)  
LP  0.09(6e05)  2.09(6e04)  LP  0.04(2e05)  1.39(3e04)  
CRS  1.23(2e04)  CRS  0.87(1e04) 
The average of each criterion (×100) over 100 runs of OUR, RS, SS, LR, LP and CRS for Example 3 with a sample size
SNR  Method  MSE  MAE  SNR  Method  MSE  MAE  

200  5  PSE  500  5  PSE  1.13(3e04)  8.23(1e03)  
RS  42.96(1e03)  57.18(1e03)  RS  43.38(7e04)  58.38(7e04)  
SS  2.38(8e04)  12.08(2e03)  SS  1.07(3e04)  8.06(1e03)  
LR  21.74(1e03)  39.07(1e03)  LR  21.70(8e04)  39.36(8e04)  
LP  12.26(9e03)  25.41(8e03)  LP  4.86(4e03)  16.32(5e03)  
CRS  2.42(8e04)  12.00(2e03)  CRS  
15  PSE  15  PSE  0.41(1e04)  4.89(8e04)  
RS  41.88(1e03)  56.89(1e03)  RS  42.73(5e04)  58.12(6e04)  
SS  0.89(3e04)  7.30(1e03)  SS  0.42(1e04)  5.01(8e04)  
LR  21.21(1e03)  38.66(1e03)  LR  21.41(7e04)  39.19(7e04)  
LP  9.66(7e03)  21.49(7e03)  LP  4.05(3e03)  14.39(5e03)  
CRS  0.87(3e04)  7.20(1e03)  CRS 
The average of each criterion (×100) over 100 runs of OUR, RS, SS, LR, LP and CRS for Example 4 with a sample size
SNR  Method  MSE  MAE  SNR  Method  MSE  MAE  

500  5  PSE  1000  5  PSE  8.54(9e04)  
RS  55.76(3e03)  58.79(2e03)  RS  56.32(2e03)  59.17(1e03)  
SS  2.42(5e04)  12.06(1e03)  SS  1.26(3e04)  8.74(1e03)  
LR  23.51(1e03)  40.43(9e04)  LR  23.71(8e04)  40.90(7e04)  
LP  14.54(7e03)  27.81(6e03)  LP  7.91(3e03)  20.54(4e03)  
CRS  2.45(6e04)  11.92(1e03)  CRS  
15  PSE  0.86(2e04)  7.09(9e04)  15  PSE  
RS  55.66(3e03)  58.74(2e03)  RS  55.67(2e03)  58.80(1e03)  
SS  0.93(2e04)  7.49(9e04)  SS  0.50(1e04)  5.50(6e04)  
LR  22.98(9e04)  40.20(9e04)  LR  23.35(7e04)  40.68(7e04)  
LP  13.80(7e03)  25.92(6e03)  LP  7.40(4e03)  18.94(5e03)  
CRS  CRS  0.44(1e04)  5.13(7e04) 
The average of each criterion (×100) over 100 runs of OUR, RS, SS, LR, LP and CRS for Example 5 with a sample size
SNR  Method  MSE  MAE  SNR  Method  MSE  MAE  

500  5  PSE  3.49(8e04)  14.17(2e03)  1000  5  PSE  1.77(4e04)  10.14(1e03) 
RS  144.56(8e03)  91.51(3e03)  RS  145.23(5e03)  91.76(2e03)  
SS  SS  
LR  12.57(1e03)  26.85(1e03)  LR  12.12(7e04)  26.32(9e04)  
LP  24.62(1e02)  33.32(6e03)  LP  12.37(5e03)  23.91(4e03)  
CRS  3.36(8e04)  14.08(2e03)  CRS  1.75(4e04)  10.18(1e03)  
15  PSE  1.19(3e04)  8.33(1e03)  15  PSE  
RS  142.86(8e03)  90.88(2e03)  RS  145.64(5e03)  92.02(2e03)  
SS  1.18(3e04)  8.41(1e03)  SS  0.66(1e04)  6.33(6e04)  
LR  11.60(1e03)  25.39(1e03)  LR  11.75(6e04)  25.82(7e04)  
LP  21.71(1e02)  30.19(6e03)  LP  11.80(5e03)  22.16(4e03)  
CRS  CRS  6.16(6e04) 
Stepwise knot selection Algorithm
is a model with one knot added to the 

is a model with the oldest knot removed and the new knot adds to .  
1.  Let 
2.  Compute 
3.  If 
4.  For 
(a) Construct .  
(b) 

(c) 

5. 
Coordinate descent Algorithm (CDA)