The model in our approach assumes that computer responses are a realization of a Gaussian processes superimposed on a regression model called a Gaussian process regression model (GPRM). Selecting a subset of variables or building a good reduced model in classical regression is an important process to identify variables influential to responses and for further analysis such as prediction or classification. One reason to select some variables in the prediction aspect is to prevent the over-fitting or under-fitting to data. The same reasoning and approach can be applicable to GPRM. However, only a few works on the variable selection in GPRM were done. In this paper, we propose a new algorithm to build a good prediction model among some GPRMs. It is a post-work of the algorithm that includes the Welch method suggested by previous researchers. The proposed algorithms select some non-zero regression coefficients (
The development of computer technology has enabled researchers to replace a physical experiment using complex computer simulation codes. In addition, computer codes often have high dimensional inputs. In these cases, computer simulation codes can be computationally expensive; therefore, it can be impossible to directly use a computer simulation code for the design and analysis of computer experiment (DACE), because it needs to run many computer simulation codes for the optimization of objective functions. However, one can use a statistical model as a metamodel to approximate a functional relationship between the input variables and response values of a computer simulation instead of the simulation code itself.
Typically, a computer code is deterministic or it has a small measurement error. For this reason, Sacks
Mechanical engineering: Slonski (2011), Lee and Gard (2014), and Dubourg
Health economics: Rojnik and Naveršnik (2008), and Stevenson
Computer and system engineering: Kennedy
Chemical engineering: Gomes
Other fields: Kapoor
However, many of these researches have not applied a systematic model selection method, which has motivated the current work. In classical regression, when there are many independent variables, we select the subset of independent variables which gives good fit. The reason why is because over-fitting can lead to a significant variance of predicts and under-fitting can lead to a bias of predicts. Therefore, for the building of prediction model, it is important to find best subset of independent variables which gives good fit (Jung and Park, 2015; Lee, 2015). The same reasoning and approach can be applicable to GPRM. However, only a few works about the variable selection in GPRM have been done (Linkletter
The classical regression model has only regression coefficients
Following Sacks
where
where
where
Once data have been collected at the observation sites {
where
The MLE’s of the parameters are then plugged into the spatial linear model to predict
where
The GPRM from (
Model 1:
Model 2:
Model 3: first order liner model + common
Model 4: first order liner model + all
Here the “common
One purpose of computer experiments is to establish a cheap metamodel using the above Gaussian process model and Kriging prediction. For this purpose, a good prediction model should be built. Our experience leads to a model with a combination of some
Welch
Model 5: W6 algorithm
S-0 Set = {1, 2, …,
S-1 For each
S-2 Let
S-3 If
For stopping criterion, they used the number 6 which is
The algorithm of Welch
Model 6: forward
S-4 Estimate such
S-5 Select
S-6 When the
Model 7: backward
S-4 Estimate such
S-5 Eliminate
S-6 When the
When the values of some
Model 8: forward
S-4 Let
S-5 For
Consider all
Choose the best among these
S-6 Select a single best model from among
Model 9: backward
S-4 Let
S-5 For
Consider all
Choose the best among these
S-6 Select a single best model from among
Model 10: Lasso guided
The lasso coefficients,
with respect to
We first apply the Lasso to our data, and find some non-zero
Now we have five models in which some
We studied simple functions as test toy models in order to check the performance of the proposed algorithm compared to the basic models and Welch’s approach. After obtaining sample responses from selected design sites, we followed the above algorithms. The performance is evaluated by comparing the true function values and the predictions of the model and by using the BIC. We tried four test functions and one real data example in this study. For the evaluation of the prediction model, the following root mean squared prediction error (rmspe) is calculated for 500 or 1,000 random points on (−0.5, 0.5)
The relative improvement of the best model over M5 (Welch algorithm) is calculated by
where
As a design site, 12 runs optimal Latin hypercube design is constructed by Park (1994)’s algorithm (Figure 1).
Table 2 shows the selected prediction models and performance. Based on rmspe, M5 does not work well compared to M1–M4. The models M6 and M8 are similar and work best. The improvement over M5 is 58.0%. Based on BIC, M7, and M9 work best. This test function is a linear model that may be adequately approximated by the model of several included
where
This equation from Morris and Mitchell (1995) has a physical interpretation that
Table 3 shows the selected prediction models and their performance. Based on rmspe, aM5 model is better than M1–M4, but worse than M6–M10. The best are M6, M8, and M10. The improvement over M5 is 33.9%. Based on BIC, M7, and M9 work best. Figure 3 shows the residual plots of the models in which the plot of M6 looks better than the others.
where
for −0.5 ≤
Table 4 shows the selected prediction models and their performance. Based on rmspe, M6 and M7 are the same and work best. The improvement over M5 is 58.9%. Based on BIC, M8 and M9 are the same and work best. Figure 4 shows the residual plots of the models in which the plot of M6 looks better than others.
where
where the response y is the time it takes to complete one cycle (sec);
Table 5 shows the selected prediction models and performance. Based on rmspe and BIC, M6, M7, M8, and M9 are the same and work best. The improvement over M5 is 15.9%. Figure 5 shows the residual plots of the models, in which the plot of M6 looks best.
The MARTHE dataset is realization of the MARTHE code, which is about numerical simulation of Strontium-90 transport in the upper aquifer of the “Kurchatov Institute” radiation waste disposal site. The computer code is not accessible; therefore, we obtained the dataset from the Virtual Library of Simulation Experiment (see Surjanovic and Bingham, 2015). The data consists of 300 observations with 20 input variables and 10 output dimensions. In this study, we only in 20 input variables and the first output variable ‘p102K’. Two hundred observations are used for training data to build models, and the other 100 observations are used for test data to compute rmspe.
Table 6 shows the selected prediction models and performance. M9 is best based on rmspe and BIC. The improvement over M5 is 1.4%.
This study proposes an algorithm to build a good prediction model. It is a post-work of the algorithm by Welch
We tried several alternative approaches during the progress of this study as follows. In addition, we should study all of them further despite the problems faced in our brief experience to reach a conclusion on which is the best.
Pingpong approach: Select one
Backward elimination from the full model: After estimating all
Sample size is an important setting in the test function study. To reliably evaluate a stable performance, we think a large sample is especially required when the dimension of input variables is high. When there are measurement errors in computer codes or in physical experiments, we set the parameter
This study was financially supported by Chonnam National University, 2014.
Description on the considered Gaussian process regression models (M1–M10)
Model | Description | ||
---|---|---|---|
M1 | Model 1 in Section 2 | ||
M2 | all | Model 2 in Section 2 | |
M3 | all | Model 3 in Section 2 | |
M4 | all | all | Model 4 in Section 2 |
M5 | W6 algorithm in Section 3 | ||
M6 | Forward | ||
M7 | Backward | ||
M8 | Forward | ||
M9 | Backward | ||
M10 | Lasso guided |
LRT = likelihood ratio test; BIC = Bayesian information criterion.
Selected models and their rmspe and BIC value for test function 1
Model | rmspe | BIC | ||
---|---|---|---|---|
M1 | 0.1783 | 24.9748 | ||
M2 | all | 0.1921 | 19.6810 | |
M3 | all | 0.1777 | −4.9507 | |
M4 | all | all | 0.1709 | −5.6084 |
M5 | 0.3667 | 24.2830 | ||
M6 | 0.1540 | −7.2326 | ||
M7 | 0.2226 | −8.4816 | ||
M8 | 0.1540 | −7.2326 | ||
M9 | 0.2226 | −8.4816 | ||
M10 | 0.1646 | −5.4570 |
A description on models (M1–M10) is provided in Table 1.
rmspe = root mean squared prediction error; BIC = Bayesian information criterion.
Result for test function 2
Model | rmspe | BIC | ||
---|---|---|---|---|
M1 | 1.6435 | 562.8069 | ||
M2 | all | 1.0843 | 502.4237 | |
M3 | all | 1.2010 | 577.0617 | |
M4 | all | all | 0.9083 | 441.5618 |
M5 | 0.5585 | 340.9313 | ||
M6 | 0.3689 | 264.2891 | ||
M7 | 0.3731 | 262.3349 | ||
M8 | 0.3689 | 264.2891 | ||
M9 | 0.3731 | 262.3349 | ||
M10 | 0.3689 | 264.2891 |
The others are the same as Table 2.
rmspe = root mean squared prediction error; BIC = Bayesian information criterion.
Result for test function 3
Model | rmspe | BIC | ||
---|---|---|---|---|
M1 | 3.4471 | 290.3785 | ||
M2 | all | 1.2965 | 215.6780 | |
M3 | all | 3.9277 | 332.6453 | |
M4 | all | all | 1.5580 | 266.4429 |
M5 | 2.4937 | 254.8991 | ||
M6 | 1.0237 | 188.5468 | ||
M7 | 1.0237 | 188.5468 | ||
M8 | 1.0499 | 187.6149 | ||
M9 | 1.0499 | 187.6149 | ||
M10 | 1.2551 | 207.4258 |
The others are the same as Table 2.
rmspe = root mean squared prediction error; BIC = Bayesian information criterion.
Result for test function 4
Model | rmspe | BIC | ||
---|---|---|---|---|
M1 | 0.01705 | −779.3298 | ||
M2 | all | 0.00296 | −1354.8725 | |
M3 | all | 0.01999 | −823.0619 | |
M4 | all | all | 0.00170 | −1581.1772 |
M5 | 0.00082 | −1768.6100 | ||
M6 | 0.00069 | −1943.1533 | ||
M7 | 0.00069 | −1943.1533 | ||
M8 | 0.00069 | −1943.1533 | ||
M9 | 0.00069 | −1943.1533 | ||
M10 | all | 0.00070 | −1923.4304 |
The others are the same as Table 2.
rmspe = root mean squared prediction error; BIC = Bayesian information criterion.
Result for MARTHE dataset
Model | rmspe | BIC | ||
---|---|---|---|---|
M1 | 0.3550 | 95.2770 | ||
M2 | all | 0.2838 | 2.8409 | |
M3 | all | 0.3339 | 123.0978 | |
M4 | all | all | 0.3072 | 52.0115 |
M5 | 0.2803 | −6.3102 | ||
M6 | 0.2817 | 5.7842 | ||
M7 | 0.2822 | 1.0033 | ||
M8 | 0.2779 | −7.7810 | ||
M9 | 0.2764 | −11.9247 | ||
M10 | 0.2769 | −5.7742 |
The others are the same as Table 2.
rmspe = root mean squared prediction error; BIC = Bayesian information criterion.