TEXT SIZE

CrossRef (0)
Model selection algorithm in Gaussian process regression for computer experiments

Youngsaeng Leea, and Jeong-Soo Park1,a

aDepartment of Statistics, Chonnam National University, Korea
Correspondence to: Department of Statistics, Chonnam National University, 77 Yongbong-ro, Buk-gu, Gwangju 61186, Korea. E-mail: jspark@jnu.ac.kr
Received February 9, 2017; Revised June 29, 2017; Accepted June 30, 2017.
Abstract

The model in our approach assumes that computer responses are a realization of a Gaussian processes superimposed on a regression model called a Gaussian process regression model (GPRM). Selecting a subset of variables or building a good reduced model in classical regression is an important process to identify variables influential to responses and for further analysis such as prediction or classification. One reason to select some variables in the prediction aspect is to prevent the over-fitting or under-fitting to data. The same reasoning and approach can be applicable to GPRM. However, only a few works on the variable selection in GPRM were done. In this paper, we propose a new algorithm to build a good prediction model among some GPRMs. It is a post-work of the algorithm that includes the Welch method suggested by previous researchers. The proposed algorithms select some non-zero regression coefficients (β’s) using forward and backward methods along with the Lasso guided approach. During this process, the fixed were covariance parameters (θ’s) that were pre-selected by the Welch algorithm. We illustrated the superiority of our proposed models over the Welch method and non-selection models using four test functions and one real data example. Future extensions are also discussed.

Keywords : Bayesian information criterion, best linear unbiased prediction, covariance matrix, Kriging, maximum likelihood estimation, metamodel, numerical optimization
1. Introduction

The development of computer technology has enabled researchers to replace a physical experiment using complex computer simulation codes. In addition, computer codes often have high dimensional inputs. In these cases, computer simulation codes can be computationally expensive; therefore, it can be impossible to directly use a computer simulation code for the design and analysis of computer experiment (DACE), because it needs to run many computer simulation codes for the optimization of objective functions. However, one can use a statistical model as a metamodel to approximate a functional relationship between the input variables and response values of a computer simulation instead of the simulation code itself.

Typically, a computer code is deterministic or it has a small measurement error. For this reason, Sacks et al. (1989) suggested adopting a Gaussian process regression model (GPRM) as a metamodel for the computer simulation code. The GPRM has often been successfully used in the past for the modeling of computer simulation data. A few examples of recent works that used GPRM for the modeling of computer simulation data are as follows:

However, many of these researches have not applied a systematic model selection method, which has motivated the current work. In classical regression, when there are many independent variables, we select the subset of independent variables which gives good fit. The reason why is because over-fitting can lead to a significant variance of predicts and under-fitting can lead to a bias of predicts. Therefore, for the building of prediction model, it is important to find best subset of independent variables which gives good fit (Jung and Park, 2015; Lee, 2015). The same reasoning and approach can be applicable to GPRM. However, only a few works about the variable selection in GPRM have been done (Linkletter et al., 2006; Marrel et al., 2008; Welch et al., 1992), and almost all previous researches employed simple GPRM withou a variable and parameter selection. Linkletter et al. (2006) used a Bayesian approach while Marrel et al. (2008) used a corrected Akaike information criterion, which are both different from our approach.

The classical regression model has only regression coefficients β’s; however, the GRRM has β’s as well as coefficients θ’s in the correlation function that complicates the variable selection task. In this paper, we propose to select some θ’s, first by the Welch method and then, by under fixing the θ’s, select some β’s by forward selection and backward elimination. The proposed algorithms are validated and compared to the simple models and Welch method through four test functions as well as one real data example. From the test functions study, we found that the model obtained from the proposed methods provide better result than the other models in relation to the prediction error.

Following Sacks et al. (1989), we consider a Gaussian process model defined on an index set Rd for DACE:

$y(x_)=∑j=1pβjfj(x_)+Z(x_)$

where f ’s are a known function and β’s are unknown regression coefficient. Here the random process Z(·) is assumed to be a Gaussian process with mean zero and covariance matrix σ2V for σ2> 0, where σ2 is the process variance (a scale factor) and V is a correlation matrix. Among many possible covariance functions (see Santner et al., 2003), we consider the “power exponential family” which is given by

$cov(Z(x_i),Z(x_j))=σ2exp(-∑k=1dθk|xik-xjk|αk)+σe2δi,j,$

where θk ≥ 0, 0 < αk ≤ 2 for all k, $σe2$ is the variance due to nugget effect, and δi,j is the Kronecker delta. Then correlation function is

$vi,j=corr(Z(x_i),Z(x_j))=exp(-∑k=1dθk|xik-xjk|αk)+γδi,j,$

where $γ=σe2/σ2$ and γ ≥ 0. Throughout this study, we set α = 2 and γ = 0.

Once data have been collected at the observation sites {x1, …, xn}, we use maximum likelihood estimation (MLE) method to estimate parameters in the model (1.1) and (1.2). Since we assume y(x) is a Gaussian process with mean and covariance matrix σ2V, the likelihood function of y is

$L(y;θ,β,σ2,γ,α,x)=(2πσ2)-n2∣V∣exp(-(y-Fβ)tV-1(y-Fβ)2σ2),$

where F is a design matrix. A numerical optimization procedure is required because the likelihood equations do not lead to a closed form solution. We need an efficient MLE searching program to build a good prediction model. Its implementation details are presented in Park and Baek (2001).

The MLE’s of the parameters are then plugged into the spatial linear model to predict y(x) at which x is not an observation site (the prediction is called Kriging). The empirical best linear unbiased prediction with the MLE (θ̂,β̂) of parameters plugged into is given by

$Y^(x0)=f0tβ^+r0tV^-1(y_-Fβ^),$

where f0 is the known linear regression functions vector, r0 is the correlation vector between x0 and design S, and y is the vector of observation collected at the design sites.

2. Existing model selection algorithm

The GPRM from (1.1) has regression coefficients β’s as well as coefficients θ’s in the correlation function that complicates the variable selection. Among the many possible combinations of the β’s and θ’s, we consider the following four models as basic ones (Cox et al., 2001):

• Model 1: β0 + common θc

• Model 2: β0 + all θ’s

• Model 3: first order liner model + common θc

• Model 4: first order liner model + all θ’s.

Here the “common θ” means that d number of θ’s are forced to be a common θc such that θ1 = θ2 = · · · = θd := θc.

One purpose of computer experiments is to establish a cheap metamodel using the above Gaussian process model and Kriging prediction. For this purpose, a good prediction model should be built. Our experience leads to a model with a combination of some β’s and θ’s that gives a good prediction model. In this paper, we describe an algorithm for building a good prediction model.

Welch et al. (1992) described an algorithm to screen important input variables in computer experiments that used a Gaussian process model. They proposed using a dimensional reduction scheme to perform a series of presumably simpler optimization. The idea is to make tractable, the high-dimensional minimization by constraining the number of free θ’s; only “important” θ’s are allowed to posses their own values. Following Santner et al. (2003), we describe the algorithm below because our algorithm is a post-work of it. Hereafter, it is referred to as the Welch6 or W6 algorithm, because six people including the first author Welch wrote the paper (Welch et al., 1992). At each stage of the process, let denote the indices of the variables having a common θ for that step and let = j. Notice that [S-0] is an initialization step, while [S-1] and [S-2] are induction steps. Only β0 is estimated for the linear model term.

• Model 5: W6 algorithm

• S-0 Set = {1, 2, …, d}, i.e., θ1 = θ2 = · · · = θd := θc. Maximize (1.4) as a function of θc and the resulting log likelihood by l0.

• S-1 For each j, maximize (1.4) under the constraint that θ’s with in have a common value and θj varies freely. Denote the result by lj.

• S-2 Let jM denote the θ producing the largest increase in ljl0 for j.

• S-3 If ljMl0 represents a significant increase in the log likelihood as judged by a stopping criterion, then set = , l0 = ljM, and fix θjM at its value estimated in [S-1]. Continue the next iteration at [S-1]. Otherwise, stop the algorithm and take θ̂’s produced by the previous iteration.

For stopping criterion, they used the number 6 which is $χ.052(2)$ based on the 2 times log likelihood ratio.

3. Proposed algorithms

The algorithm of Welch et al. (1992) works for screening input variables. But for building prediction model, we attach more steps to select some of β terms in the model. Let us assume that k θ’s are selected in the above W6 algorithm; we next then fix it as true. Now we will consider five approaches: forward selection and Backward elimination of β’s based on likelihood ratio test (LRT), forward selection and Backward elimination of β’s based on Bayesian information criterion (BIC), and least absolute shrinkage and selection operator (Lasso) guided selection. Note that all of these are a kind of “W6 + β selection” algorithm.

• Model 6: forward β selection based on LRT

• S-4 Estimate such k θ’s under constant linear model (only β0) by k-dimensional optimization routine. Here the k θ’s are allowed to posses values freely. Then fix the MLE of k θ’s throughout the iterations.

• S-5 Select β’s producing the largest increase of the log likelihood as the same as in the forward selection algorithm in ordinary regression model building. The significance of the increase is judged by a stopping criterion.

• S-6 When the β selection is done, estimate the k θ’s and the selected β’s using the MLE program.

• Model 7: backward β elimination based on LRT

• S-4 Estimate such k θ’s under the first order linear model by k-dimensional optimization routine. Then fix the MLE of k θ’s throughout the iterations.

• S-5 Eliminate β’s producing the smallest decrease of log likelihood as the same as in the backward elimination algorithm in ordinary regression model building. The significance of the decrease is judged by a stopping criterion.

• S-6 When the β elimination is done, estimate the k θ’s and the remained β’s using the MLE program.

When the values of some θ’s are fixed, the computation involved in selection or in elimination of β’s are relatively fast, no optimization is required; β̂ = (FtV−1F)−1FtV−1y. Here β̂ is the generalized least squares estimator of β.

• Model 8: forward β selection based on BIC (James et al., 2013)

• S-4 Let M0 denote the null model, which contains no predictors.

• S-5 For k = 0, …, p − 1:

• Consider all pk models that augment the predictors in Mk with one additional predictor

• Choose the best among these pk models, and call it Mk+1. Here the best is defined as having smallest negative log likelihood

• S-6 Select a single best model from among M0, …, Mp using the smallest BIC, where

$BICk=-2 ln(L)+k·ln(n).$

• Model 9: backward β elimination based on BIC

• S-4 Let Mp denote the f ull model, which contains all p predictors.

• S-5 For k = p, p − 1, …, 1:

• Consider all k models that contain all but one of the predictors in Mk, for a total of k − 1 predictors

• Choose the best among these k models, and call it Mk−1. Here the best is defined as having smallest negative log likelihood

• S-6 Select a single best model from among M0, …, Mp using BIC.

• Model 10: Lasso guided β selection

The lasso coefficients, $β^λL$, minimize the quantity

$∑i=1n(yi-β0-∑j=1pβjxij)2+λ∑j=1p∣βj∣,$

with respect to β’s. The Lasso shrinks the coefficient estimates towards zero. The l1 penalty on β has the effect of forcing some of the coefficient estimates to be exactly zero when the tuning parameter λ is sufficiently large. Hence, the Lasso performs variable selection much like the best subset selection (James et al., 2013). Selecting a good value of λ is usually done by cross-validation.

We first apply the Lasso to our data, and find some non-zero $β^λL$’s. Then our selected model is the GPRM with only the non-zero $β^λL$’s and with the fixed covariance terms obtained by the Welch algorithm. Note that the Lasso here is applied without the covariance assumption (1.2) and is only used to select non-zero coefficients under the independent error model.

Now we have five models in which some β’s are selected by the proposed methods. Table 1 presents a description on the considered 10 GPR models (GPRMs) (M1–M10).

4. Test function study

We studied simple functions as test toy models in order to check the performance of the proposed algorithm compared to the basic models and Welch’s approach. After obtaining sample responses from selected design sites, we followed the above algorithms. The performance is evaluated by comparing the true function values and the predictions of the model and by using the BIC. We tried four test functions and one real data example in this study. For the evaluation of the prediction model, the following root mean squared prediction error (rmspe) is calculated for 500 or 1,000 random points on (−0.5, 0.5)d:

$rmspe=1n∑j=1N(yj-y^j)2.$

The relative improvement of the best model over M5 (Welch algorithm) is calculated by

$rmspe(M5)-rmspe(best)rmspe(M5)×100.$

### 4.1. Test function 1

$y=y1+y2+y1×y2,$

where

$y1=4x1+x2+x34+x416,$$y2=x32-x22, -0.5≤xi≤0.5.$

As a design site, 12 runs optimal Latin hypercube design is constructed by Park (1994)’s algorithm (Figure 1).

Table 2 shows the selected prediction models and performance. Based on rmspe, M5 does not work well compared to M1–M4. The models M6 and M8 are similar and work best. The improvement over M5 is 58.0%. Based on BIC, M7, and M9 work best. This test function is a linear model that may be adequately approximated by the model of several included β terms. Figure 2 shows the residual plots of the models, in which the plot of M6 looks better than others. The Lasso guided selection, M10 did not work well. This may be because M10 uses the Lasso without taking the covariance matrix into account, while our GPRM remains the covariance dependent.

### 4.2. Test function 2

$y=2πx3(x4-x6)ln(x2/x1)[1+U(x)],$

where

$U(x)=2x7x3ln(x2/x1)x12×9855+x3x5.$

This equation from Morris and Mitchell (1995) has a physical interpretation that y represent steady-state flow of water through a borehole between two aquifers. A 7-dimensional 100 runs Latin hyper cube design is constructed as a design site. One thousand random points on the domain were used to compute the rmspe.

Table 3 shows the selected prediction models and their performance. Based on rmspe, aM5 model is better than M1–M4, but worse than M6–M10. The best are M6, M8, and M10. The improvement over M5 is 33.9%. Based on BIC, M7, and M9 work best. Figure 3 shows the residual plots of the models in which the plot of M6 looks better than the others.

### 4.3. Test function 3

$y=y1+y2,$

where

$y1=5x121+x1+5(x4-x20)2+x5+40x193-5x19,$$y2=0.05x2+0.08x3-0.03x6+0.03x7-0.09x9-0.01x10-0.07x11+0.25x132-0.04x14+0.06x15-0.01x17-0.03x18$

for −0.5 ≤ xi ≤ 0.5. This test function is from Welch et al. (1992). As a design site, a 20-dimensional 50 runs Latin hypercube design is used. One thousand random points on the domain were used to compute the rmspe.

Table 4 shows the selected prediction models and their performance. Based on rmspe, M6 and M7 are the same and work best. The improvement over M5 is 58.9%. Based on BIC, M8 and M9 are the same and work best. Figure 4 shows the residual plots of the models in which the plot of M6 looks better than others.

### 4.4. Test function 4

$y=2πx1x4+x22x5x3x7x6V2,$

where

$V=x22x4(A2+4kx5x3x7x6-A),$$A=x5x2+19.62x1-x4x7x2,$

where the response y is the time it takes to complete one cycle (sec); x1 ∈ [30, 60] is the piston weight (kg); x2 ∈ [0.005, 0.02] is the piston surface area (m2); x3 ∈ [0.002, 0.01] is the initial gas volume (m3); x4 ∈ [1,000, 5,000] is the spring coefficient (N/m); x5 ∈ [90,000, 110,000] is the atmospheric pressure (N/m2); x6 ∈ [290, 296] is the ambient temperature (K); x7 ∈ [340, 360] is the filling gas temperature (K). This piston simulation function is from Moon (2010) and the Virtual Library of Simulation Experiment (Surjanovic and Bingham, 2015) that models the circular motion of a piston. As a design site, a 7-dimensional 200 runs Latin-hypercube design is used. One thousand random points on the domain computed the rmspe.

Table 5 shows the selected prediction models and performance. Based on rmspe and BIC, M6, M7, M8, and M9 are the same and work best. The improvement over M5 is 15.9%. Figure 5 shows the residual plots of the models, in which the plot of M6 looks best.

### 4.5. Real example: MARTHE dataset

The MARTHE dataset is realization of the MARTHE code, which is about numerical simulation of Strontium-90 transport in the upper aquifer of the “Kurchatov Institute” radiation waste disposal site. The computer code is not accessible; therefore, we obtained the dataset from the Virtual Library of Simulation Experiment (see Surjanovic and Bingham, 2015). The data consists of 300 observations with 20 input variables and 10 output dimensions. In this study, we only in 20 input variables and the first output variable ‘p102K’. Two hundred observations are used for training data to build models, and the other 100 observations are used for test data to compute rmspe.

Table 6 shows the selected prediction models and performance. M9 is best based on rmspe and BIC. The improvement over M5 is 1.4%.

5. Summary and discussion

This study proposes an algorithm to build a good prediction model. It is a post-work of the algorithm by Welch et al. (1992). It selects some β’s while the pre-selected θ’s are fixed. Using four test functions and one real data example, we illustrated the superiority of the proposed models over other models. Forward selection and backward elimination of β’s based on the likelihood ratio test or BIC work well; however, the models built by the algorithm by Welch et al. (1992) and by the Lasso guided selection did not work well.

We tried several alternative approaches during the progress of this study as follows. In addition, we should study all of them further despite the problems faced in our brief experience to reach a conclusion on which is the best.

• β first approach: Selecting β’s first (and then selecting θ’s) did not work well.

• θ backward elimination: The strategy that estimating all θ’s first, and eliminate some θ’s, and then select some β’s seems good. Based on our experience, its performance is fair compared to the proposed algorithm; however, it is computationally expensive compared to Welch algorithm and our proposed methods.

• θ Forward selection: Select θ one-by-one method like as the forward selection algorithm in the ordinary regression model building. This did not work well.

• Pingpong approach: Select one θ, and select one β, and select another one θ, and select another one β. It seems poor; however, still needs a future study.

• Backward elimination from the full model: After estimating all β’a and all θ’s from the Model 4, compute Wald’s t-statistics (T = θ̂/SE(θ̂)) to eliminate some β’s and θ’s, and continue this until no more deletion. The Fisher information matrix may be required to obtain SE(θ̂).

Sample size is an important setting in the test function study. To reliably evaluate a stable performance, we think a large sample is especially required when the dimension of input variables is high. When there are measurement errors in computer codes or in physical experiments, we set the parameter γ in (1.3) as positive. So our model selection methods in GPRM can also be applicable to physical or computer experiments with the measurement error. For a practical situation, we cannot compute the prediction error as we did in the test function study. Therefore, a leave-one-out or k-fold cross-validation version of prediction error is required. Finally, we may go further to select or eliminate more θ’s and β’s from the models (M6–M9) considered in this paper that may provide a better model in the sense of less prediction error. However, these remain topics for future study.

Acknowledgements

This study was financially supported by Chonnam National University, 2014.

Figures
Fig. 1. A 4-dimension 12 runs optimal Latin-hypercube design used for test function 1.
Fig. 2. Residual plots of the prediction models for the test function 1 with 500 random prediction points. Plots for M8 and M9 are omitted because they are the same for M6 and M7, respectively. See for a description on models (M1–M10).
Fig. 3. Residual plots of the prediction models for the test function 2 with 1,000 random prediction points. Plots for M8, M9, and M10 are omitted because they are the same for M6 and M7, respectively. See for a description on models (M1–M10).
Fig. 4. Residual plots of prediction models for test function 3 with 1,000 random prediction points. Plots for M7 and M9 are omitted because they are the same for M6 and M8, respectively. See for a description of the models (M1–M10).
Fig. 5. Residual plots of prediction models for test function 4 with 1,000 random prediction points. Plots for M7, M8, and M9 are omitted because these are the same for M6. See for a description of the models (M1–M10).
TABLES

### Table 1

Description on the considered Gaussian process regression models (M1–M10)

ModelβθDescription
M1β0θcModel 1 in Section 2
M2β0all θ’sModel 2 in Section 2
M3all β’sθcModel 3 in Section 2
M4all β’sall θ’sModel 4 in Section 2
M5β0θ selected by W6 algorithmW6 algorithm in Section 3
M6β selectedθ selected by W6 algorithmForward β selection based on LRT
M7β selectedθ selected by W6 algorithmBackward β elimination based on LRT
M8β selectedθ selected by W6 algorithmForward β selection based on BIC
M9β selectedθ selected by W6 algorithmBackward β elimination based on BIC
M10β selectedθ selected by W6 algorithmLasso guided β selection

LRT = likelihood ratio test; BIC = Bayesian information criterion.

### Table 2

Selected models and their rmspe and BIC value for test function 1

ModelβθrmspeBIC
M1β0θc0.178324.9748
M2β0all θ’s0.192119.6810
M3all β’sθc0.1777−4.9507
M4all β’sall θ’s0.1709−5.6084
M5β0θ1, θ20.366724.2830
M6β0, β1, β2, β3θ1, θ20.1540−7.2326
M7β0, β1, β3,β4θ1, θ20.2226−8.4816
M8β0, β1, β2,β3θ1, θ20.1540−7.2326
M9β0, β1,β3,β4θ1, θ20.2226−8.4816
M10β0, β1,β2,β3, β4θ1, θ20.1646−5.4570

A description on models (M1–M10) is provided in Table 1.

rmspe = root mean squared prediction error; BIC = Bayesian information criterion.

### Table 3

Result for test function 2

ModelβθrmspeBIC
M1β0θc1.6435562.8069
M2β0all θ’s1.0843502.4237
M3all β’sθc1.2010577.0617
M4all β’sall θ’s0.9083441.5618
M5β0θ1, θ4,θ6, θ70.5585340.9313
M6β0, β1, β2, β4, β5, β6, β7θ1, θ4, θ6, θ70.3689264.2891
M7β0, β1, β2, β4,β5, β7θ1, θ4, θ6, θ70.3731262.3349
M8β0, β1, β2, β4, β5, β6, β7θ1, θ4, θ6, θ70.3689264.2891
M9β0, β1, β2, β4,β5, β7θ1, θ4, θ6, θ70.3731262.3349
M10β0, β1, β2, β4,β5, β7θ1, θ4, θ6, θ70.3689264.2891

The others are the same as Table 2.

rmspe = root mean squared prediction error; BIC = Bayesian information criterion.

### Table 4

Result for test function 3

ModelβθrmspeBIC
M1β0θc3.4471290.3785
M2β0all θ’s1.2965215.6780
M3all β’sθc3.9277332.6453
M4all β’sall θ’s1.5580266.4429
M5β0θ1, θ4, θ5, θ12, θ19, θ202.4937254.8991
M6β0, β5, β9, β11, β12θ1, θ4, θ5, θ12, θ19, θ201.0237188.5468
M7β0, β5, β9, β11, β12θ1, θ4, θ5, θ12, θ19, θ201.0237188.5468
M8β0, β5, β9, β11θ1, θ4, θ5, θ12, θ19, θ201.0499187.6149
M9β0, β5, β9, β11θ1, θ4, θ5, θ12, θ19, θ201.0499187.6149
M10β0, β2, β7, β9, β12, β18, β19θ1, θ4, θ5, θ12, θ19, θ201.2551207.4258

The others are the same as Table 2.

rmspe = root mean squared prediction error; BIC = Bayesian information criterion.

### Table 5

Result for test function 4

ModelβθrmspeBIC
M1β0θc0.01705−779.3298
M2β0all θ’s0.00296−1354.8725
M3all β’sθc0.01999−823.0619
M4all β’sall θ’s0.00170−1581.1772
M5β0θ1, θ2, θ3, θ4, θ50.00082−1768.6100
M6β0, β1, β4, β6θ1, θ2, θ3, θ4, θ50.00069−1943.1533
M7β0, β1, β4, β6θ1, θ2, θ3, θ4, θ50.00069−1943.1533
M8β0, β1, β4, β6θ1, θ2, θ3, θ4, θ50.00069−1943.1533
M9β0, β1, β4, β6θ1, θ2, θ3, θ4, θ50.00069−1943.1533
M10all β’sθ1, θ2, θ3, θ4, θ50.00070−1923.4304

The others are the same as Table 2.

rmspe = root mean squared prediction error; BIC = Bayesian information criterion.

### Table 6

Result for MARTHE dataset

ModelβθrmspeBIC
M1β0θc0.355095.2770
M2β0all θ’s0.28382.8409
M3all β’sθc0.3339123.0978
M4all β’sall θ’s0.307252.0115
M5β0θ2, θ3, θ4, θ6, θ14,θ15, θ19, θ200.2803−6.3102
M6β0, β1, β2, β3,β6, β9, β15, β19, β20θ2, θ3, θ4, θ6, θ14,θ15, θ19, θ200.28175.7842
M7β0, β1, β6, β12, β15, β19,β20θ2, θ3, θ4, θ6, θ14,θ15, θ19, θ200.28221.0033
M8β0, β2, β12, β15,β20θ2, θ3, θ4, θ6, θ14,θ15, θ19, θ200.2779−7.7810
M9β0, β2, β15, β20θ2, θ3, θ4, θ6, θ14,θ15, θ19, θ200.2764−11.9247
M10β0, β2, β6, β13, β14, β15, β20θ2, θ3, θ4, θ6, θ14,θ15, θ19, θ200.2769−5.7742

The others are the same as Table 2.

rmspe = root mean squared prediction error; BIC = Bayesian information criterion.

References
1. Caballero, JA, and Grossmann, IE (2008). Rigorous flowsheet optimization using process simulators and surrogate models. Computer Aided Chemical Engineering. 25, 551-556.
2. Cox, DD, Park, JS, and Singer, CE (2001). A statistical method for tuning a computer code to a data base. Computational Statistics & Data Analysis. 37, 77-92.
3. Deng, H, Shao, W, Ma, Y, and Wei, Z (2012). Bayesian metamodeling for computer experiments using the Gaussian Kriging models. Quality and Reliability Engineering International. 28, 455-466.
4. Dubourg, V, Sudret, B, and Deheeger, F (2013). Metamodel-based importance sampling for structural reliability analysis. Probabilistic Engineering Mechanics. 33, 47-57.
5. Gomes, MVC, Bogle, IDL, Biscaia, EC, and Odloak, D (2008). Using Kriging models for real-time process optimisation. Computer Aided Chemical Engineering. 25, 361-366.
6. James, G, Witten, D, Hastie, T, and Tibshirani, R (2013). An Introduction to Statistical Learning: with Applications in R. New York: Springer
7. Johnson, JS, Gosling, JP, and Kennedy, MC (2011). Gaussian process emulation for second-order Monte Carlo simulations. Journal of Statistical Planning and Inference. 141, 1838-1848.
8. Jung, SY, and Park, C (2015). Variable selection with nonconcave penalty function on reduced-rank regression. Communications for Statistical Applications and Methods. 22, 41-54.
9. Kapoor, A, Grauman, K, Urtasun, R, and Darrell, T (2010). Gaussian processes for object categorization. International Journal of Computer Vision. 88, 169-188.
10. Kennedy, MC, Anderson, CW, Conti, S, and O’Hagan, A (2006). Case studies in Gaussian process modelling of computer codes. Reliability Engineering & System Safety. 91, 1301-1309.
11. Kumar, A (2015). Sequential tuning of complex computer models. Journal of Statistical Computation and Simulation. 85, 393-404.
12. Lee, JH, and Gard, K (2014). Vehicle-soil interaction: testing, modeling, calibration and validation. Journal of Terramechanics. 52, 9-21.
13. Lee, S (2015). An additive sparse penalty for variable selection in high-dimensional linear regression model. Communications for Statistical Applications and Methods. 22, 147-157.
14. Linkletter, C, Bingham, D, Hengartner, N, Higdon, D, and Ye, KQ (2006). Variable selection for Gaussian process models in computer experiments. Technometrics. 48, 478-490.
15. Liu, YJ, Chen, T, and Yao, Y (2013). Nonlinear process monitoring by integrating manifold learning with Gaussian process. Computer Aided Chemical Engineering. 32, 1009-1014.
16. Marrel, A, Iooss, B, van Dorpe, F, and Volkova, E (2008). An efficient methodology for modeling complex computer codes with Gaussian processes. Computational Statistics and Data Analysis. 52, 4731-4744.
17. Moon, H 2010. Design and analysis of computer experiments for screening input variables. Doctoral dissertation. Ohio State University. Columbus, OH.
18. Morris, MD, and Mitchell, TJ (1995). Exploratory designs for computational experiments. Journal of Statistical Planning and Inference. 43, 381-402.
19. Park, JS (1994). Optimal Latin-hypercube designs for computer experiments. Journal of Statistical Planning and Inference. 39, 95-111.
20. Park, JS, and Baek, J (2001). Efficient computation of maximum likelihood estimators in a spatial linear model with power exponential covariogram. Computers & Geosciences. 27, 1-7.
21. Rohmer, J, and Foerster, E (2011). Global sensitivity analysis of large-scale numerical landslide models based on Gaussian-Process meta-modeling. Computers & Geosciences. 37, 917-927.
22. Rojnik, K, and Naveršnik, K (2008). Gaussian process metamodeling in Bayesian value of information analysis: a case of the complex health economic model for breast cancer screening. Value in Health. 11, 240-250.
23. Sacks, J, Welch, WJ, Mitchell, TJ, and andWynn, HP (1989). Design and analysis of computer experiments. Statistical Science. 4, 409-423.
24. Santner, TJ, Williams, BJ, and Notz, WI (2003). The Design and Analysis of Computer Experiments. New York: Springer
25. Silvestrini, RT, Montgomery, DC, and Jones, B (2013). Comparing computer experiments for the Gaussian process model using integrated prediction variance. Quality Engineering. 25, 164-174.
26. Slonski, M (2011). Bayesian neural networks and Gaussian processes in identification of concrete properties. Computer Assisted Mechanics and Engineering Science. 18, 291-302.
27. Stevenson, MD, Oakley, J, and Chilcott, JB (2004). Gaussian process modeling in conjunction with individual patient simulation modeling: a case study describing the calculation of cost-effectiveness ratios for the treatment of established osteoporosis. Medical Decision Making. 24, 89-100.
28. Surjanovic, S, and Bingham, D (2015). Virtual Library of Simulation Experiments: test functions and datasets: emulation/prediction test problems.Retrieved January 20, 2017, from: https://www.sfu.ca/~ssurjano/emulat.html
29. Tagade, PM, Jeong, BM, and Choi, HL (2013). A Gaussian process emulator approach for rapid contaminant characterization with an integrated multizone-CFD model. Building and Environment. 70, 232-244.
30. Welch, WJ, Buck, RJ, Sacks, J, Wynn, HP, Mitchell, TJ, and Morris, MD (1992). Screening, predicting and computer experiments. Technometrics. 34, 15-25.
31. Zhang, J, Taflanidis, AA, and Medina, JC (2017). Sequential approximate optimization for design under uncertainty problems utilizing Kriging metamodeling in augmented input space. Computer Methods in Applied Mechanics and Engineering. 315, 369-395.