Employment level plays an important role in making government labor policy, understanding the overall economic conditions and planning business investment. Therefore, forecasting employment level is crucial for government policy makers, investors, and many others. Accordingly, many studies for it have been conducted, see Rapach and Strauss (2010, 2012), Siliverstovs (2013), Lehmann and Weyh (2016) and many others for recent studies. The forecasting methods of these authors are based on statistical or economic models, such as autoregressive integrated moving average (ARIMA) model, vector autoregression and factor analysis. The recent hot applications of machine learning methods to diverse statistical problems of classification and forecasting render us to consider the artificial neural network (ANN), one of the machine learning methods, for employment level forecast.
ANN is an interesting forecasting method in that the method is capable of addressing a nonlinear structure between the employment level and predictor variables without econometric intuition. Substantial forecast efficiency gain will be demonstrated for the machine learning forecast of employment level using ANN methods of deep neural network (DNN), long short term memory (LSTM), and gated recurrent unit (GRU) over the standard AR forecast if a big data approach is used with a careful statistical consideration of dimension reduction and time series differencing. In ANN, in order to handle more complex nonlinear relationships, two or more hidden layers are added and this model is called DNN. However, DNN has limitations in that it does not address the serial dependence of most of economic time series. A recurrent neural network (RNN) is then proposed to address serial dependence. Improved modifications of the RNN have appeared: the LSTM of Hochreiter and Schmidhuber (1997) and GRU of Cho
For the machine learning forecast methods of ANN for employment level, we consider the big data of the federal reserve economic data (FRED) as predictors. The FRED is a huge big database managed by the Federal Reserve Bank of St. Louis and is composed of more than 500,000 economic time series related to banking, employment, population, and consumer price indexes.
The FRED is huge and contains unit root series. Therefore, we need to consider two statistical issues of dimension reduction and time series differencing. McCracken and Ng (2016) choose 105 important macroeconomic variables among the over-500,000 variables in the FRED. They also provided background information for these important variables and the transformation method of each series, such as degree of differencing and log transformation. The recommendation by McCracken and Ng (2016) will be applied to machine learning forecasting of employment level.
We identify that consideration of the two statistical issues improves the forecast performance of the ANN methods. Out-of-sample forecast comparison with a model confidence set (MCS) analysis of Hansen
The remaining of the paper is organized as follows. Section 2 describes the FRED. Section 3 explains forecast methods. Section 4 makes an out-of-sample forecast comparison. Section 5 gives the conclusion.
We consider the FRED in forecasting the U.S. monthly civilian employment level for the period of 01/01/1985–12/01/2018 of
The FRED is a huge big database maintained by the Federal Reserve Bank of St. Louis. The FRED are collected from global financial institutions and U.S. government agencies such as the U.S. Census and Bureau of Labor Statistics. The database contains various categories of economic and financial data: banking, employment and population, gross domestic product, interest rates, and consumer price indexes. McCracken and Ng (2016) reduced the over-500,000-dimensional FRED, say
The summary analysis by McCracken and Ng (2016) are mainly differencing and dimension reduction. When we use the reduced FRED big data
We will briefly discuss what difference order will be considered for each element in
We forecast the log employment level
The forecast methods based on ANN have received significant attention. ANN is one of the machine learning methods inspired by biological neural networks. Keeping in mind of the implementation of the ANN methods in Section 4 for forecasting
In ANN, forecasting is made through two step processes: first, the ANN transmits the nonlinear function value of the linear combination of predictors received in the input nodes to each hidden node, for example
However, DNN does not perform well for temporal data sets whose elements are serially correlated as are the most economic time series data. The RNN is proposed for temporally-structured data to address the serial dependence. In RNN, hidden node values have time-dependent nonlinear AR(1) structure. When we use
where
As discussed in Section 2, the FRED data sets include
We are interested in whether dimension reduction improves the machine learning forecast performance. Accordingly, we consider a type of linear regression, the least absolute shrinkage and selection operator (LASSO) regression which is widely used as a dimension reduction method, see for example Tarassow (2019), Uniejewski
where
Table 1 shows predictors selected by LASSO regression for
Focusing on the role of differencing and dimension reduction, we make on out-of-sample forecast comparison of log employment level
where
Let
For each forecast method, efficiency gain relative to the benchmarking AR forecast is compared. For example, MAE efficiency gain of the DNN method relative to the AR method is
It means better forecast performance of one forecast model than AR(
For the ANN forecasts described in Section 3.1, we need to specify the hyperparameters: learning rate, optimization method, loss function, the number of epochs (
For each
Table 2 shows the RMSE and the MAE efficiency gains of the ANN forecasts relative to the AR forecast. ANN forecasts based on differenced predictors gain efficiency over the AR forecast, while those based on non-differenced predictors lose substantial gains. We find that, for 1, 3, 6 step forecasts, the GRU method with all the 105 FRED differenced predictors Δ
From the efficiency gain of the ANN forecasts based on the non-differenced data
In the table, we check statistical significances of the efficiency gain by the test of Diebold and Mariano (1995). The DM test, for example for RMSE efficiency gain, is the
For more formal comparison, we make a MCS analysis of Hansen
In forecasting the U.S. employment level, machine learning methods of DNN, LSTM, and GRU are considered. The predictors are chosen to be the 105 important macroeconomic variables selected by McCracken and Ng (2016) among the big data of the FRED. We consider the two statistical issues of dimension reduction and time series differencing in the machine learning forecast. An out-of-sample comparison shows substantial efficiency gain for the machine learning forecasts over the AR forecast if proper differencing is considered. The comparison reveals that, for
This study was supported by a grant from the National Research Foundation of Korea (2019R1A2C1004679), by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2019R1A6A1A11051177) and by the Ewha Womans University Scholarship of 2017.
Predictors selected by LASSO
h-step | Variables | ||||||
---|---|---|---|---|---|---|---|
1 | PAYEMS | SRVPRD | CLF16OV | USWTRADE | DPCERA3M086SBEA | ||
3 | PAYEMS | MANEMP | USFIRE | PERMITNE | HOUST | HOUSTMW | NDMANEMP |
6 | PAYEMS | MANEMP | USFIRE | PERMITNE | USGOOD | HOUSTMW | HOUSTNE |
TB3MS | T5YFFM | USWTRADE | |||||
12 | PAYEMS | MANEMP | USFIRE | PERMITNE | HOUST | HOUSTMW | CES1021000001 |
EXSZUS | T10YFFM | USWTRADE | USTPU | USGOOD | UMCSENT |
The RMSE and the MAE efficiency gain (%) of the ANN forecasts relative to the AR forecasts and the Diebold-Mariano test results
h-step | AR( |
ANN methods with 105 FRED predictors, |
ANN methods with LASSO predictors | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Non-differenced data, |
Differenced data, Δ |
Differenced data, Δ |
|||||||||
DNN | LSTM | GRU | DNN | LSTM | GRU | DNN | LSTM | GRU | |||
1 | RMSE | 0.0023 | − |
− |
− |
1.9 | −2.4 | 4.0 | 5.3 | 0.0 | 2.1 |
MAE | 0.0018 | − |
− |
− |
3.7 | 2.6 | 7.6 | 2.5 | 3.4 | ||
3 | RMSE | 0.0046 | − |
− |
− |
−4.6 | 20.1 | 21.7 | 17.1 | 10.3 | |
MAE | 0.0031 | − |
− |
− |
−4.8 | 14.8 | 3.3 | 3.0 | |||
6 | RMSE | 0.0071 | − |
− |
− |
25.5 | 25.9 | 58.8 | −2.6 | 50.7 | 35.0 |
MAE | 0.0047 | − |
− |
− |
16.3 | −8.0 | 18.4 | ||||
12 | RMSE | 0.0110 | − |
− |
− |
36.7 | 47.2 | 38.7 | 40.3 | 60.0 | 50.5 |
MAE | 0.0071 | − |
− |
− |
16.4 | 29.2 | 21.3 | 31.7 |
Bold type is significant at 10% level by the Diebold Mariano test. RMSE = root mean square error; MAE = mean absolute error; ANN = artificial neural network; AR = autoregressive; FRED = federal reserve economic data; LASSO = least absolute shrinkage and selection operator; DNN = deep neural network; LSTM = long short term memory; GRU = gated recurrent unit.
The forecasting MCS performance:
h-step | AR( |
ANN methods with 105 FRED predictors, |
ANN methods with LASSO predictors | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Non-differenced data, |
Differenced data, Δ |
Differenced data, Δ |
|||||||||
DNN | LSTM | GRU | DNN | LSTM | GRU | DNN | LSTM | GRU | |||
1 | RMSE | 0.98(5) | 0.00 | 0.00 | 0.00 | 1.00(4) | 0.94(7) | 1.00(2) | 1.00(1) | 0.96(6) | 1.00(3) |
MAE | 0.28(7) | 0.00 | 0.00 | 0.00 | 0.99(3) | 0.91(5) | 1.00(1) | 0.96(4) | 0.90(6) | 1.00(2) | |
3 | RMSE | 0.34(6) | 0.00 | 0.00 | 0.00 | 0.20(7) | 1.00(2) | 1.00(1) | 1.00(3) | 0.99(4) | 0.49(5) |
MAE | 0.12(5) | 0.00 | 0.00 | 0.00 | 0.03(7) | 1.00(2) | 1.00(1) | 0.05(6) | 1.00(3) | 0.18(4) | |
6 | RMSE | 0.40(4) | 0.00 | 0.00 | 0.00 | 0.29(5) | 0.84(3) | 1.00(1) | 0.00 | 0.97(2) | 0.06(6) |
MAE | 0.07(5) | 0.00 | 0.00 | 0.00 | 0.14(4) | 0.70(3) | 1.00(1) | 0.00 | 0.98(2) | 0.00 | |
12 | RMSE | 0.70(5) | 0.00 | 0.00 | 0.00 | 0.12(7) | 0.98(2) | 0.71(4) | 0.12(6) | 1.00(1) | 0.71(3) |
MAE | 0.13(5) | 0.00 | 0.00 | 0.00 | 0.11(6) | 0.99(2) | 0.70(3) | 0.10(7) | 1.00(1) | 0.57(4) |
MCS = model confidence set; ANN = artificial neural network; FRED = federal reserve economic data; LASSO = least absolute shrinkage and selection operator; AR = autoregressive; DNN = deep neural network; LSTM = long short term memory; GRU = gated recurrent unit; RMSE = root mean square error; MAE = mean absolute error.