Population aging is a global trend. According to the UN (2019), the life expectancy in the United States at birth is 78.8 years in 2020 and is expected to be 79.8 years in 2030 and 84.4 years in 2060. The life expectancy of Korea is 82.8 years and is expected to be 84.2 years, and 87.9 years for the same periods, respectively.
The prospect of an increase in life expectancy is a desirable phenomenon from a personal perspective, but it also becomes a concern when considering social costs such as pension funds and health insurance that need to be paid to the elderly. In addition, the stability of the social security system may be undermined as economic growth slows due to a shortage of labor and declines in savings, consumption, and investment.
Forecasting mortality is essential in understanding the future aging level and maintaining financial stability of social insurance systems such as the national pension and health insurance systems. Government policies related to such social systems must rely on a forecast to reduce and effectively manage costs incurred by the rapidly increasing elderly population.
The most commonly used model for forecasting mortality is the Lee-Carter model (Lee and Carter, 1992) due to its easy application. However, various studies have been conducted to improve the Lee-Carter model. Some of these studies are Li and Lee (2005), Renshaw and Haberman (2006), Booth and Tickle (2008), Booth
Haldrup and Rosenskjold (2019) designed 4-PFM by applying the dynamic Nelson-Siegel model (Diebold and Li, 2006), which has been very popular for forecasting interest rates in the financial market, to central death rates. 4-PFM can work reliably even when the mortality structure changes. Choi (2021) showed that 4-PFM was more reliable than the LC model when the mortality structure changed due to COVID-19. However, forecast accuracy was not as good as the LC model in normal circumstances. Thus, 5-PFM and 6-PFM were developed to improve this accuracy. 5-PFM is obtained by adding one more factor and 6-PFM is obtained by adding two more factors to 4-PFM. Mathematical expressions of 4-PFM, 5-PFM and 6-PFM are in
where
For 5-PFM,
In this paper, 6-PFM is selected as a new representative model because 6-PFM performs better than 5-PFM as seen in the accuracy-test results in Section 4.3. Residual analysis also shows that 6-PFM is appropriate. The residual analysis on the error term in
6-PFM consists of 6 factors and corresponding 6 factor loadings as in
The two graphs in Figure 2 are the shapes of the loading functions of males and females of the United States fitted by the U.S. life tables from 1933 to 2018. The parameters
Next, let’s move onto the factors. The trends of the factors are important because a forecast is conducted based on these trends. While the loadings do not change over time, the factors do. Thus we can forecast the central death rates by analyzing the trends of the factors. The past trends of the 6 factors for the United States and Korea are shown in Figures 4 and Figure 5, respectively. For the U.S., the common factors (black) show some decreasing trends for both male (left) and female (right), but, since they have negative signs the effects of the common factors on the central death rates increase. The infant factors (red) do not show conspicuous trends but steady decreasing trends for both genders. The accident factor 1(green) and the accident factor 2(blue) move in opposite directions for both genders. For males, the accident factor 1 increases, and the accident factor 2 decreases, which means that the accidents of 15-year-old males increase and those of 20-year-old males decrease as time passes. For females, the accident factor 1 decreases, and the accident factor 2 increases, which means that the accidents of 12-year-old females decrease and those of 17-year-old females increase. The aging factor 1’s(sky-blue) have the strongest effect on the central death rates among all the factors for both genders because they are at the highest level. However, the aging factor 2’s (purple) rarely affect the central death rates for both genders since they are located around 0. Thus, the aging factor2’s have the role of fine-tuning the fitting accuracy for the model.
For Korea, the common factors (black) reduce for both males and females as for the U.S. The infant factors rarely change as time goes by, so their effects on the central death rates do not change for both genders. The accident factors do not show conspicuous trends, though the accident factor 2 of females shows a steady increasing trend. Also, the accident factors are located around 0, so the effects of these factors on the central death rates are very small. The aging factor 1 for males (sky-blue) increases towards 2000 and since that year shows some fluctuation; and the aging factor 2 of males (purple) moves horizontally until around 2000 and shows some increasing trend with fluctuation since the same year. Thus the combined effects of these two factors on the central death rates increase. The aging factor 1 of females (sky-blue) has a strong effect and increases until around 2005, thereafter, it then shows some reduction. The aging factor 2(purple) moves horizontally and moves upwards since around 2005. The aging factors of females are quite differently located compared to those of males. It is not easy to find out why they are so differently located. It might be that the accident occurs more frequently at younger ages (aging factor 1) in Korea.
When we compare the strength of the effects on the central death rates among factors, the infant factor and the aging factors have strong positive effects and the common factors have negative effects for both males and females.
When compared between the two countries, the common factors show negative effects and their strengths increase for both countries. The infant factors also show similar effects for both countries though the trends decrease a little bit for the U.S. The shapes of the accident factor 1 and the accident factor 2 look different between the two countries, but the combined effects are small for both countries since the adding of two factors are near zero. For the aging factors, the linear effect is stronger than the convex effect for both genders in the U.S. since the aging factor 1’s for males and females are at higher levels. However, the convex effect is stronger than the linear effect for males in Korea, though the two aging effects for females are similar for both countries.
Long short-term memory (LSTM) is one of the deep learning algorithm methods appropriate for time-series forecasting. LSTM can learn long-term dependencies by learning sequences of observations so that it can forecast a future output based on past features of the data. Raschka and Mirjalili (2019) has shown the structure of LSTM in Figure 6.
where
One common problem for fitting models is overfitting. An overfitted model reflects many details and noises of the data, so its forecasting performance is reduced. To improve the model’s performance, regularization is adopted to the model (Merity
There are several regularization methods as L1 and L2, and dropout. L1 and L2 regularizations are the methods to reduce or delete weights, and dropout is a technique that randomly drops units from the neural network (Srivastava
where
The U.S. life-tables (human mortality database) from 1933 to 2018 and the Korea life-tables (Statistics of Korea) from 1983 to 2019 are used for the accuracy tests. The Korea life tables before 1983 were not used because the data was not relatively reliable.
The accuracy tests are performed for ages 0 to 109 and 0 to 99 for the U.S. and Korea, respectively. Both mortality rates of the maximum ages of 110(U.S.) and 100(Korea) are fixed at 1, so they are deleted from the life tables. The tests are carried out for 5 different forecasting periods from 1 year to 5 years as in Table 1. Five different forecasting periods are tried for the tests to avoid the coincidence of the test results (i.e. lucky small errors or unlucky large errors). For example, to test for forecasting 5-year log central death rates for the U.S., central death rates from 1933 to 2013 are used to fit the model, and forecasting results are produced for 2014~2018.
After the forecasted log central death rates are produced, the tests are performed by measuring the root mean square error (RMSE) as in
where
The process of forecasting 6-PFM is as follows:
Step 1: fit the model to the latest-year mortality data from the life-tables to determine the parameters as in
where ln(
Step 2: estimate 6 factors year by year using the fixed parameters (
where
Step 3: forecast 6 factors by applying the vector autoregressive model (VAR) to capture the relationship between multiple quantities as it changes over time, LSTM, or regularized LSTM.
Step 4: compute the log central death rates by entering the 6 factors forecasted in Step 3 and the parameters estimated in year
The accuracies are compared among the 4 models: the LC model, 4-PFM, 5-PFM, and 6-PFM. The 3 forecasting methods of 6-PFM are applied. Thus a total of 6 models are compared: the LC model, 4-PFM with VAR, 5-PFM with VAR, 6-PFM with VAR, 6-PFM with LSTM, and 6-PFM with the regularized LSTM. Forecasting process is excluded and the results are only shown for the LC model, 4-PFM, and 5-PFM. For more details about the LC model and 4-PFM, refer to Lee and Carter (1992) and Haldrup and Rosenskjold (2019).
Forecasting requires time-series analysis. For 6-PFM, 3 different analyses, the VAR, LSTM, and the regularized LSTM are carried out. To forecast from the VAR, the Johansen test, which uses a function ca.jo() in R package is performed for co-integration analysis. For example, the co-integration analysis results for test1 are in Table 2. For the U.S., there are 3 co-integration effects for males and 2 co-integration effects for females because the test value of 45.31, when r (the number of co-integration) ≤ 2 is rejected, but the value of 23.44, when r ≤ 3 is not rejected at the level of significance of 10% for male, and the value of 68.05, when r ≤ 1 is rejected but the value of 35.72, when r ≤ 2 is not rejected at the same level of significance for female. Similarly, we can determine that there are 3 co-integration effects for both males and females in Korea. Refer to Pfaff (2008) for more explanation.
The co-integration results for the 5 accuracy tests are in Table 3. The number in Table 3 is the number of co-integrations. Male in the U.S. and both genders in Korea have 2~3 co-integrations and female in the U.S. has 1~2 co-integrations. Since co-integrations exist for all the tests, the vector error correction model (VECM) should be used and the function vec2var() in R package is used for each test to transform VECM which is the object of the formal class generated by the function ca.jo() into the VAR representation. In Section 4.3 this model is expressed as ‘6-PFM’ instead of ‘6-PFM with VAR’. In the same manner, 4-PFM and 5-PFM, which use VAR, are expressed as ‘4-PFM’ and ‘5-PFM’, respectively.
For LSTM, an algorithm should be set to learn the mapping function from the input (training dataset) to the output (test dataset). For example, a time series data 1, 2, 3, 4, 5, . . ., can be transformed into two separate sets
where
To forecast the central death rates from LSTM, the time dependence removal of the data is needed by subtracting the previous observations from the current one. Without differentiating , the forecast results may be too sensitively changed to the number of epochs (the number of passes of the entire training dataset the machine learning algorithm completes) or to the values of the regularizing parameter. The differentiating can reduce the instability of the time series data, which leads to stable results. After differentiating, the log central death rates are transformed into a supervised learning format (sliding window transformation) so that the model can learn to fit the target values. The transformed data is inverted into the original scale after the forecast is completed and then it computes RMSE to measure the accuracy. There are 6 factors to be forecasted, and the difference is taken between consecutive observations (lag 1 difference) for each factor. Then, lag 1 differences are transformed into the supervised learning format. We use the window width of 7-time steps in the training data set.
As an example for test 2, the time series of the 6 factors are transformed into two separate sets, training set (
where
The number of epochs is set to be 10,000 and the input unit is set to be 42 (=6 × 7) for all the tests, and the output unit is set to be 6 (=6 × 1) for test1, 12 (=6 × 2) for test 2, . . ., 30 (=6 × 5) for test 5. For the regularized LSTM, the regularizing parameter
The results of the accuracy tests are in Table 4. The upper two parts are the average RMSEs of the U. S. male and female groups and the lower two parts are the average RMSEs of the Korean male and female groups.
For males in the U.S., 6-PFM performs best with the average RMSE of 0.0847, followed by 6-PFM_reg-LSTM, 6-PFM_LSTM, the LC model, 5-PFM, and 4-PFM . For females, 6-PFM_reg-LSTM shows the best performance, followed by 6-PFM_LSTM, 6-PFM, 5-PFM, 4-PFM, and the LC model, in this order. In a comparison between males and females, performances for males are better for the LC model and 6-PFM, and performances for females are better for 4-PFM, 5-PFM, 6-PFM_LSTM, and 6-PFM_reg-LSTM. Differences of average RMSEs between genders are 0.014, 0.076, 0.068, 0.004, 0.007, and 0.006 for the LC model, 4-PFM, 5-PFM, 6-PFM, 6-PFM_LSTM, and 6-PFM_reg-LSTM, respectively. Thus the 6-PFMs show more stable results than the LC model, 4-PFM, and 5-PFM.
For males in Korea, 6-PFM_reg-LSTM performs best with the average RMSE of 0.1026, followed by 6-PFM_LSTM, the LC model, 6-PFM, 5-PFM, and 4-PFM, in this order. For females, 6-PFM_reg-LSTM is the best model as in males. 6-PFM is the next best, and 4-PFM shows the worst performance. All the models work better for males than for females. Differences in the average RMSEs between genders are 0.061, 0.064, 0.063, 0.040, 0.046, 0.041 for the LC model, 4-PFM, 5-PFM, 6-PFM, 6-PFM LSTM, and 6-PFM_reg-LSTM, respectively. Thus, 6-PFMs show more stable results than the other models as in the U.S.
For both countries, 6-PFMs performs better than the LC model, 4-PFM, and 5-PFM. Among the three 6-PFMs, 6-PFM_reg-LSTM is the best for both countries except for the U.S. male. When comparing two countries, there is no consistency for the LC model, 4-PFM, and 5-PFM, but there is a consistency for the three 6-PFMs. The 6-PFMs work better for the U.S. life-tables than the Korea life-tables.
One abnormal phenomena is found in the Korean female group. The average RMSEs in the Korean female group show decreasing trends as the forecasting period increases, which is not normal because longer-term forecasting should be more difficult than short-term forecasting. Maybe coincidence of the test results has occurred, as I mentioned in Section 4.1, in which unlucky large errors have occurred in the short-term forecasting and lucky small errors have occurred in the longer-term forecasting in the Korean female group.
The accuracy test results show that 6-PFM_reg-LSTM was the most accurate model for both the United States and Korea. Thus, mortality forecasts were performed for 6-PFM_reg-LSTM and the results were compared with the LC model. The future years to which the models are forecasted are 2030 and 2040. The future years are restricted to 2040 due to the short periods of the past mortality data in Korea. The number of epochs is set to be 10,000 and the regularizing parameter
The results are shown in terms of life expectancy as in Table 5. The future life expectancies of Korea are 4.16~6.34 years longer than those of the United States. When the two models are compared, 6-PFM_reg-LSTM forecasts shorter life expectancies for males than the LC model but forecasts those of females longer than the LC model for both countries. Specifically, for the United States, the 6-PFM_reg-LSTM shows that the future life expectancies in 2030 would be 76.51 and 81.59 for males and females, respectively, which are shorter life expectancy results than those of the LC model. The model also shows that the life expectancies in 2040 would be 76.55 and 81.63 for males and females, respectively, which are shorter life expectancy results than those of the LC model for males, but longer than the LC model for females.
It is noted that the life expectancies of the U.S. in 2040 are slightly improved from those in 2030 compared with Korea. For Korea, the 6-PFM_reg-LSTM shows that the life expectancies for males in 2030 would be 80.67, which is a shorter life expectancy result than the results of the LC model, but for females, the life expectancy would be 86.81, which is a longer life expectancy result than the results of the LC model. It also shows that the life expectancy for a male in 2040 would be 81.04, which is a quite shorter life expectancy result than the results of the LC model, but for a female, the life expectancy would be 87.37, which is a longer life expectancy result than the results of the LC model. Therefore, the future life expectancies might be shorter for males but longer for females than we expect since the standard model used for forecasting in both countries is the LC model.
6-PFM was developed in an effort to increase the accuracy of the mortality forecast by adding two factors to 4-PFM. With the added two factors, 6-PFM has shown better performance than 4-PFM and 5-PFM as expected. It was also shown that 6-PFM performed better than the LC model in most of the accuracy tests. Among the 3 forecasting methods of 6-PFM, regularized LSTM was shown to have the best performance for both countries except for the U.S. male group. Therefore LSTM is strongly recommended to be adapted when developing forecasting models.
In this paper, the fixed parameter of 0.05 for regularized LSTM was used, but how to determine the appropriate value of the parameter was not developed. The model could perform better if it could be fine-tuned by regularization. Developing fine-tuning methods is proposed for future research.
▪ model = Sequential()
▪ model.add (LSTM(n_periods_in*n_features, activation=’relu’, input_shape=
(n_periods_in, n_features),x activity_regularizer=tf.keras.regularizers.
L1(l1=\lambda))) ▪ model.add (RepeatVector(n_periods_out))
▪ model.add (LSTM(n_periods_out*n_features, activation=’relu’, return_sequences=True))
▪ model.add (TimeDistributed(Dense(n_features)))
▪ model.compile (optimizer=’adam’, loss=’mse’)
▪ hist = model.fit(X, Y, epochs=10,000, verbose=0)
▪ yhat = model.predict(x_input, verbose=0)
where
The periods for model input and output for 5 tests
Country | Test | Input periods | Output periods |
---|---|---|---|
U.S. | test1 | 1933~2017 | 2018 |
test2 | 1933~ 2016 | 2017~2018 | |
test3 | 1933~ 2015 | 2016~2018 | |
test4 | 1933~ 2014 | 2015~2018 | |
test5 | 1933~ 2013 | 2014~2018 | |
Kor | test1 | 1983~2018 | 2019 |
test2 | 1983~ 2017 | 2018~2019 | |
test3 | 1983~ 2016 | 2017~2019 | |
test4 | 1983~ 2015 | 2016~2019 | |
test5 | 1983~ 2014 | 2015~2019 |
Results of Johansen test of 6-PFM for test 1
# co-integration ( | Test (US) | Test (Kor) | Significance level | ||||
---|---|---|---|---|---|---|---|
Male | Female | Male | Female | 10% | 5% | 1% | |
0.08 | 0.03 | 4.95 | 1.45 | 6.50 | 8.18 | 11.65 | |
7.23 | 2.31 | 12.90 | 6.82 | 15.66 | 17.95 | 23.52 | |
23.44 | 15.09 | 22.00 | 23.45 | 28.71 | 31.52 | 37.22 | |
45.31 | 35.72 | 54.11 | 46.20 | 45.23 | 48.28 | 55.43 | |
89.29 | 68.05 | 101.97 | 79.48 | 66.49 | 70.60 | 78.87 | |
148.64 | 103.46 | 157.62 | 160.04 | 85.18 | 90.39 | 104.20 |
The co-integration test results of 6-PFM for all 5 accuracy tests
Contry | Gender | test 1 | test 2 | test 3 | test 4 | test 5 |
---|---|---|---|---|---|---|
U.S. | Male | 3 | 2 | 3 | 3 | 3 |
Female | 2 | 1 | 1 | 1 | 1 | |
Korea | Male | 3 | 2 | 2 | 2 | 3 |
Female | 3 | 3 | 3 | 2 | 2 |
The results of accuracy tests
Country | Gender | Forecasting period | LC | 4-PFM | 5-PFM | 6-PFM | 6-PFM_LSTM | 6-PFM_reg-LSTM |
---|---|---|---|---|---|---|---|---|
U.S. | Male | 1 year | 0.1487 | 0.1764 | 0.1511 | 0.0703 | 0.0778 | 0.0652 |
2 years | 0.1530 | 0.1817 | 0.1554 | 0.0829 | 0.0733 | 0.0707 | ||
3 years | 0.1584 | 0.1921 | 0.1859 | 0.0800 | 0.0854 | 0.0929 | ||
4 years | 0.1646 | 0.2010 | 0.1818 | 0.0857 | 0.1022 | 0.1070 | ||
5 years | 0.1675 | 0.2100 | 0.1942 | 0.1049 | 0.1179 | 0.1122 | ||
Female | Average | 0.1585 | 0.1922 | 0.1737 | 0.0847 | 0.0913 | 0.0896 | |
1 year | 0.1654 | 0.1006 | 0.0899 | 0.0654 | 0.0656 | 0.0612 | ||
2 years | 0.1689 | 0.1069 | 0.0918 | 0.0693 | 0.0630 | 0.0640 | ||
3 years | 0.1710 | 0.1120 | 0.0936 | 0.0747 | 0.0886 | 0.0874 | ||
4 years | 0.1810 | 0.1230 | 0.1223 | 0.1079 | 0.1065 | 0.1058 | ||
5 years | 0.1773 | 0.1367 | 0.1324 | 0.1260 | 0.0956 | 0.1005 | ||
Average | 0.1727 | 0.1159 | 0.1060 | 0.0887 | 0.0839 | 0.0838 | ||
Rep. of Kor | Male | 1 year | 0.1070 | 0.1095 | 0.1028 | 0.0966 | 0.0993 | 0.0983 |
2 years | 0.1003 | 0.1133 | 0.1028 | 0.0894 | 0.0929 | 0.0918 | ||
3 years | 0.1100 | 0.1173 | 0.1141 | 0.1000 | 0.0997 | 0.0963 | ||
4 years | 0.1143 | 0.1219 | 0.1218 | 0.1218 | 0.1039 | 0.1018 | ||
5 years | 0.1250 | 0.1625 | 0.1727 | 0.1506 | 0.1332 | 0.1248 | ||
Average | 0.1113 | 0.1249 | 0.1228 | 0.1116 | 0.1058 | 0.1026 | ||
Female | 1 year | 0.1795 | 0.1923 | 0.1858 | 0.1620 | 0.1892 | 0.1593 | |
2 years | 0.1776 | 0.1822 | 0.1792 | 0.1520 | 0.1674 | 0.1611 | ||
3 years | 0.1671 | 0.1900 | 0.1865 | 0.1546 | 0.1406 | 0.1373 | ||
4 years | 0.1742 | 0.1998 | 0.1960 | 0.1429 | 0.1370 | 0.1365 | ||
5 years | 0.1651 | 0.1775 | 0.1811 | 0.1473 | 0.1262 | 0.1246 | ||
Average | 0.1727 | 0.1884 | 0.1857 | 0.1518 | 0.1521 | 0.1438 |
The projected life expectancies
Country | Gender | Year | LC model | 6-PFM_reg-LSTM |
---|---|---|---|---|
U.S. | Male | 2030 | 77.35 | 76.51 |
2040 | 78.61 | 76.55 | ||
Female | 2030 | 80.78 | 81.59 | |
2040 | 80.88 | 81.63 | ||
Kor | Male | 2030 | 82.61 | 80.67 |
2040 | 84.95 | 81.04 | ||
Female | 2030 | 85.73 | 86.81 | |
2040 | 86.28 | 87.37 |