TEXT SIZE

search for



CrossRef (0)
6-Parametric factor model with long short-term memory
Communications for Statistical Applications and Methods 2021;28:521-536
Published online September 30, 2021
© 2021 Korean Statistical Society.

Janghoon Choi1,a

aKorea Insurance Research Institute, Korea
Correspondence to: 1 Korea Insurance Research Institute, 38 Gukjeguemyung-ro 6-gil, Youngdeungpo-gu, Seoul, Korea. E-mail: james021@gmail.com
Received April 15, 2021; Revised June 28, 2021; Accepted August 27, 2021.
 Abstract
As life expectancies increase continuously over the world, the accuracy of forecasting mortality is more and more important to maintain social systems in the aging era. Currently, the most popular model used is the Lee-Carter model but various studies have been conducted to improve this model with one of them being 6-parametric factor model (6-PFM) which is introduced in this paper. To this new model, long short-term memory (LSTM) and regularized LSTM are applied in addition to vector autoregression (VAR), which is a traditional time-series method. Forecasting accuracies of several models, including the LC model, 4-PFM, 5-PFM, and 3 6-PFM’s, are compared by using the U.S. and Korea life-tables. The results show that 6-PFM forecasts better than the other models (LC model, 4-PFM, and 5-PFM). Among the three 6-PFMs studied, regularized LSTM performs better than the other two methods for most of the tests.
Keywords : Lee Carter model, 6-PFM, accuracy test, vector autoregression, long short-term memory, regularized long short-term memory
1. Introduction

Population aging is a global trend. According to the UN (2019), the life expectancy in the United States at birth is 78.8 years in 2020 and is expected to be 79.8 years in 2030 and 84.4 years in 2060. The life expectancy of Korea is 82.8 years and is expected to be 84.2 years, and 87.9 years for the same periods, respectively.

The prospect of an increase in life expectancy is a desirable phenomenon from a personal perspective, but it also becomes a concern when considering social costs such as pension funds and health insurance that need to be paid to the elderly. In addition, the stability of the social security system may be undermined as economic growth slows due to a shortage of labor and declines in savings, consumption, and investment.

Forecasting mortality is essential in understanding the future aging level and maintaining financial stability of social insurance systems such as the national pension and health insurance systems. Government policies related to such social systems must rely on a forecast to reduce and effectively manage costs incurred by the rapidly increasing elderly population.

The most commonly used model for forecasting mortality is the Lee-Carter model (Lee and Carter, 1992) due to its easy application. However, various studies have been conducted to improve the Lee-Carter model. Some of these studies are Li and Lee (2005), Renshaw and Haberman (2006), Booth and Tickle (2008), Booth et al. (2002, 2006), Hyndman and Ullah (2007), Cairns et al. (2006), and Wi힄niowski et al. (2015). More recently, efforts to apply deep learning algorithms have been made by Perla et al. (2020), Richma and Wüthrich (2019), and Nigri et al. (2019). In this paper, the 6-parametric factor model (6-PFM) is introduced. 6-PFM is an improved version of 4-parametric factor model (4-PFM) (Haldrup and Rosenskjold, 2019). Also, forecast methods using deep-learning algorithms, which included LSTM and regularized LSTM were applied in addition to the traditional time series method to 6-PFM. The investigation will determine if the newly designed 6-PFM has a better performance than the existing LC model and 4-PFM as well as 5-parametric factor model (5-PFM) which is also newly designed in this paper. The rest of this paper is organized as follows: Section 2 introduces 6-PFM and section 3 reviews the deep-learning algorithms of LSTM and regularized LSTM. Section 4 fits the models with the data from the U.S. and Korea life-tables and performs accuracy tests. After the tests, real forecasts are carried out and the forecast results are shown. Finally, Section 5 presents the conclusion.

2. The 6-parametric factor model

Haldrup and Rosenskjold (2019) designed 4-PFM by applying the dynamic Nelson-Siegel model (Diebold and Li, 2006), which has been very popular for forecasting interest rates in the financial market, to central death rates. 4-PFM can work reliably even when the mortality structure changes. Choi (2021) showed that 4-PFM was more reliable than the LC model when the mortality structure changed due to COVID-19. However, forecast accuracy was not as good as the LC model in normal circumstances. Thus, 5-PFM and 6-PFM were developed to improve this accuracy. 5-PFM is obtained by adding one more factor and 6-PFM is obtained by adding two more factors to 4-PFM. Mathematical expressions of 4-PFM, 5-PFM and 6-PFM are in equation (2.1), equation (2.2), and equation (2.3), respectively.

ln(mx,t)=β0t+β1te-λ1x+β2te-λ2(ln(x)-ln(k1))2+β4t(xω)λ4+x,t,ln(mx,t)=β0t+β1te-λ1x+β2te-λ2(ln(x)-ln(k1))2+β4t(xω)λ4+β5t(xω)λ5+x,t,ln(mx,t)=β0t+β1te-λ1x+β2te-λ2(ln(x)-ln(k1))2+β3te-λ3(ln(x)-ln(k2))2+β4t(xω)λ4+β5t(xω)λ5+x,t.

where x is age, t is year, and ω is limiting age. The log central death rate (ln(mx,t)) of 4-PFM is fitted by 4 factors which are β0t, β1t, β2t, β4t, and corresponding factor loadings which are 1, eλ1x, eλ2 (ln(x)−ln(k1))2, and (x/ω)λ4. The factor loadings are estimated by parameters λ1, λ2, λ4, and k1. β0t is a factor applying to all ages in common (Siler, 1979) so that the corresponding loading is 1. β1t is a factor showing death rates of infants (Rogers and Little 1994), and its loading function corresponding to β1t has an L-shape, which means that the function decreases very rapidly as age increases from 0 but the function decreases very slowly after some increase away from 0. The bigger the λ1 is, the more rapidly decreasing the function is. β2t is a factor expressing death rates from the accident humps in young ages (Heligman and Pollard, 1980), and its loading function has a bell shape, which means the function increases toward a peak age k1 and decreases thereafter. β4t is a factor expressing death rate increase as age increases (Gompertz, 1825). The loading function corresponding to β4t has a linear shape when λ4 equals 1, is concave when less than 1, and is convex when greater than 1. εx,t is an error term and is assumed to follow iid N(0,σ2).

For 5-PFM, β5t(x/ω)λ5is added to 4-PFM, and for 6-PFM, β3teλ3(ln(x)−ln(k2))2 is added to 5-PFM to increase the forecasting accuracy. Higher factor models such as 7-PFM and 8-PFM could be designed, but there would be too many factors and loadings to fit in those models, so factor models higher than 6-PFM are not considered.

In this paper, 6-PFM is selected as a new representative model because 6-PFM performs better than 5-PFM as seen in the accuracy-test results in Section 4.3. Residual analysis also shows that 6-PFM is appropriate. The residual analysis on the error term in equation (2.3), using the 1933~2018 U.S. and 1983~2019 Korea life tables, shows that the errors are normally distributed for 6-PFM as you can see in Figure 1.

6-PFM consists of 6 factors and corresponding 6 factor loadings as in equation (2.3). The factors, which reveal some trends as time passes, are the objects for the forecast. The factor loadings do not have trends as time passes but display patterns according to ages. They are explained as follows:

The two graphs in Figure 2 are the shapes of the loading functions of males and females of the United States fitted by the U.S. life tables from 1933 to 2018. The parameters λ’s and k’s in the loading functions are estimated by using the 2018 U.S. life-table. The figure above is the shape of a male and the one below is the shape of a female. The common loading functions (black) are the same for both genders because they are fixed at 1. The infant loading functions (red) show infant death rates which have L-shapes as expected. The shapes look similar for both genders but the curve of a female is a little sharper than the curve of a male because λ1 for a female is greater than that for a male (2.6 vs. 2.2). This means that the female infant death rate is reduced more sharply than the male infant rate as the age grows near 0. The accident hump loading functions show somewhat different shapes between males and females. Two accident peaks occur at around ages 15(blue) and 20(green) for males and occur at around ages 12(green) and 17(blue) for females. The width of the bell shape for the hump where the accident peak occurs at age 15(blue for male) is quite wide compared with the other 3 curves, which means the accident does not occur narrowly around age 15 compared to the other peak ages at 20, 12, or 17. The aging loading functions (sky-blue and purple) also look a little bit different between the two genders. For males, the sky-blue line is almost linear and the purple line is fairly curved, while the sky-blue line is relatively curved for females. A linear line means that the death rate increases linearly by age, and a curved line means that the death rate increases much faster in later ages. The next two figures in Figure 3 are those of Korean males and females fitted by the Korea life-tables from 1983 to 2019. Like Figure 2, the parameters in the loading functions are estimated by using the 2019 Korea life-table. The figure above is for males and the one below is for females, and the common loading functions (black) are fixed at 1. The shapes of the infant loading functions also look similar for both genders but the curve of females is a little sharper than that of males as in the United States because λ1 for a female is greater than that for a male(2.6 vs. 1.7). The accident peak occurs at around ages 13(blue) and 18(green) for males and occurs at around ages 11(green) and 25(blue) for females. Compared with those of the United States, the two accident humps occur earlier for the Korean male, and the second accident (blue) occurs later for the Korean female, though the first accident hump occurs at similar ages in both countries. Also, the distance between two accident peak positions for the Korean female is wider than those for the U.S. female. The aging loading functions (sky-blue and purple) also look different between the two genders, which show similar shapes to the U.S. aging loading functions.

Next, let’s move onto the factors. The trends of the factors are important because a forecast is conducted based on these trends. While the loadings do not change over time, the factors do. Thus we can forecast the central death rates by analyzing the trends of the factors. The past trends of the 6 factors for the United States and Korea are shown in Figures 4 and Figure 5, respectively. For the U.S., the common factors (black) show some decreasing trends for both male (left) and female (right), but, since they have negative signs the effects of the common factors on the central death rates increase. The infant factors (red) do not show conspicuous trends but steady decreasing trends for both genders. The accident factor 1(green) and the accident factor 2(blue) move in opposite directions for both genders. For males, the accident factor 1 increases, and the accident factor 2 decreases, which means that the accidents of 15-year-old males increase and those of 20-year-old males decrease as time passes. For females, the accident factor 1 decreases, and the accident factor 2 increases, which means that the accidents of 12-year-old females decrease and those of 17-year-old females increase. The aging factor 1’s(sky-blue) have the strongest effect on the central death rates among all the factors for both genders because they are at the highest level. However, the aging factor 2’s (purple) rarely affect the central death rates for both genders since they are located around 0. Thus, the aging factor2’s have the role of fine-tuning the fitting accuracy for the model.

For Korea, the common factors (black) reduce for both males and females as for the U.S. The infant factors rarely change as time goes by, so their effects on the central death rates do not change for both genders. The accident factors do not show conspicuous trends, though the accident factor 2 of females shows a steady increasing trend. Also, the accident factors are located around 0, so the effects of these factors on the central death rates are very small. The aging factor 1 for males (sky-blue) increases towards 2000 and since that year shows some fluctuation; and the aging factor 2 of males (purple) moves horizontally until around 2000 and shows some increasing trend with fluctuation since the same year. Thus the combined effects of these two factors on the central death rates increase. The aging factor 1 of females (sky-blue) has a strong effect and increases until around 2005, thereafter, it then shows some reduction. The aging factor 2(purple) moves horizontally and moves upwards since around 2005. The aging factors of females are quite differently located compared to those of males. It is not easy to find out why they are so differently located. It might be that the accident occurs more frequently at younger ages (aging factor 1) in Korea.

When we compare the strength of the effects on the central death rates among factors, the infant factor and the aging factors have strong positive effects and the common factors have negative effects for both males and females.

When compared between the two countries, the common factors show negative effects and their strengths increase for both countries. The infant factors also show similar effects for both countries though the trends decrease a little bit for the U.S. The shapes of the accident factor 1 and the accident factor 2 look different between the two countries, but the combined effects are small for both countries since the adding of two factors are near zero. For the aging factors, the linear effect is stronger than the convex effect for both genders in the U.S. since the aging factor 1’s for males and females are at higher levels. However, the convex effect is stronger than the linear effect for males in Korea, though the two aging effects for females are similar for both countries.

3. Deep-learning method: Long short-term memory and regularized long short-term memory

3.1. Long short-term memory (LSTM)

Long short-term memory (LSTM) is one of the deep learning algorithm methods appropriate for time-series forecasting. LSTM can learn long-term dependencies by learning sequences of observations so that it can forecast a future output based on past features of the data. Raschka and Mirjalili (2019) has shown the structure of LSTM in Figure 6. Ct is a cell state at time t, xt is input at time t, and ht is a hidden layer at time t. 뒜 is element-wise multiplication and ⊕ is element-wise addition. An arrow shows current data streams. 4 boxes contain an activation function (σ) or a hyperbolic tangent (tanh), and weights (e.g. wxf ) and bias (e.g. bf ). Each box carries out matrix-vector multiplication with data and linear addition. Gate is a unit computed with an activation function leading to output through 뒜. LSTM has 3 kinds of gates, forget gate, input gate, and output gate. Forget gate (f) controls how much data to abandon by using the activation function. The activation function has values between 0 and 1 for sigmoid and values ≥ 0 for ReLu. For the sigmoid function, the value near 1 means the gate contains most of the past information, and the value near 0 means it abandons most of them. The ReLu function is similar to the sigmoid but it has a faster learning time. This gate performs element-wise multiplications between the results from the activation function and the past data from a memory cell to determine the level of maintaining past data. Input gate (i) controls the level of maintaining previous output data, and input node (g) updates the cell state with the tanh function to produce new data and perform element-wise multiplications with data from the input gate. The tanh function transfers output results between −1 and 1. Output gate (o) controls the level of reflecting a hidden layer at time t (ht) by performing element-wise multiplications with data from the memory cell through the tanh function. The mathematical expression of the LSTM is as follows.

ft=σ(Wxfxt+Whfht-1+bf)it=σ(Wxixt+Whiht-1+bi)gt=tanh(Wxgxt+Whght-1+bg)Ct=(Ct-1ft)(itgt)ot=σ(Wxoxt+Whoht-1+bo)ht=ottanh (ct)

where ft, it, ot and Ct are the same size as the hidden layer vector h. For more explanation, refer to Hochreiter and Schmidhuber (1997) and Graves et al. (2013). The readers who would like to know more about this subject may also refer to Gers et al. (2000), Jozefowicz et al. (2015), Azuma (2020), and Chung (2014).

3.2. Regularized long short-term memory

One common problem for fitting models is overfitting. An overfitted model reflects many details and noises of the data, so its forecasting performance is reduced. To improve the model’s performance, regularization is adopted to the model (Merity et al., 2017).

There are several regularization methods as L1 and L2, and dropout. L1 and L2 regularizations are the methods to reduce or delete weights, and dropout is a technique that randomly drops units from the neural network (Srivastava et al. 2014). These regularizations reduce some units and their connections in LSTM so they prevent overfitting. Dropout might delete the long-term memory connections in LSTM due to its randomness so it may harm forecasting performance if the deleted connections contain necessary information. Both L1 and L2 regularizations can reduce overfitting, but L1 regularization can make the model simpler than L2 regularization because the weights in L1 regularization reduces toward zero’s and could be exactly zero’s as a regularizing parameter increases, but the weights in L2 regularization could not be exactly zero’s (Hastie et al. 2009). Since L1 regularization makes the model simpler, L1 regularization is applied to input connections on each LSTM unit by using Keras in Python. L1 regularization can be expressed as in equation (3.1).

(b*,w*)=argmin(b,w)j(t(xj)-(b+iwihi(xj)))2+λ(b+iwi)

where b is a bias, w is a weight, xj is jth input, hi is ith hidden layer, t(xj) is a target function, and λ is a regularization parameter whose value determines the optimization of the results. As λ moves toward zero, the regularization effect is reduced and vice versa.

4. Numerical analysis

4.1. Data and procedure of accuracy test

The U.S. life-tables (human mortality database) from 1933 to 2018 and the Korea life-tables (Statistics of Korea) from 1983 to 2019 are used for the accuracy tests. The Korea life tables before 1983 were not used because the data was not relatively reliable.

The accuracy tests are performed for ages 0 to 109 and 0 to 99 for the U.S. and Korea, respectively. Both mortality rates of the maximum ages of 110(U.S.) and 100(Korea) are fixed at 1, so they are deleted from the life tables. The tests are carried out for 5 different forecasting periods from 1 year to 5 years as in Table 1. Five different forecasting periods are tried for the tests to avoid the coincidence of the test results (i.e. lucky small errors or unlucky large errors). For example, to test for forecasting 5-year log central death rates for the U.S., central death rates from 1933 to 2013 are used to fit the model, and forecasting results are produced for 2014~2018.

After the forecasted log central death rates are produced, the tests are performed by measuring the root mean square error (RMSE) as in equation (4.1).

RMSE=x=0ω(Yxm-Yxr)2ω+1

where Yxm is a forecasted value of the log central death rate at age x, Yxr is a real value of the log central death rate at age x, and ω is the limiting age. Since the future central death rates are not known at some point of the time t0 in the past, it is assumed to be a current point, and the forecast is carried out based on the assumed current point of the time t0. After RMSE is obtained by computing the differences between the forecasting results and the real log central death rates for each of the output periods, the average RMSE is finally obtained by averaging the RMSEs for the whole output periods.

4.2. Forecasting process

The process of forecasting 6-PFM is as follows:

  • Step 1: fit the model to the latest-year mortality data from the life-tables to determine the parameters as in equation (4.2)

    (β0T,β1T,β2T,β3T,β4T,β5T,λ1,λ2,λ3,λ4,k1,k2)=argminx=0ω(ln(mx,T)-ln(m^x,T))2

    where ln(mx,T ) is a real log central death rate, ln(mx,T ) is an estimated log central death rate, x is age, ω is the limiting age, and T is the latest year available.

  • Step 2: estimate 6 factors year by year using the fixed parameters (λ1, λ2, λ3, λ4, k1, k2) estimated in year T in Step 1. It is expressed in equation (4.3).

    (β0t,β1t,β2t,β3t,β4t,β5t)=argminx=0ω(ln(mx,t)-ln(m^x,t))2

    where t (≤ T) is year.

  • Step 3: forecast 6 factors by applying the vector autoregressive model (VAR) to capture the relationship between multiple quantities as it changes over time, LSTM, or regularized LSTM.

  • Step 4: compute the log central death rates by entering the 6 factors forecasted in Step 3 and the parameters estimated in year T in Step 1 into the model.

The accuracies are compared among the 4 models: the LC model, 4-PFM, 5-PFM, and 6-PFM. The 3 forecasting methods of 6-PFM are applied. Thus a total of 6 models are compared: the LC model, 4-PFM with VAR, 5-PFM with VAR, 6-PFM with VAR, 6-PFM with LSTM, and 6-PFM with the regularized LSTM. Forecasting process is excluded and the results are only shown for the LC model, 4-PFM, and 5-PFM. For more details about the LC model and 4-PFM, refer to Lee and Carter (1992) and Haldrup and Rosenskjold (2019).

Forecasting requires time-series analysis. For 6-PFM, 3 different analyses, the VAR, LSTM, and the regularized LSTM are carried out. To forecast from the VAR, the Johansen test, which uses a function ca.jo() in R package is performed for co-integration analysis. For example, the co-integration analysis results for test1 are in Table 2. For the U.S., there are 3 co-integration effects for males and 2 co-integration effects for females because the test value of 45.31, when r (the number of co-integration) ≤ 2 is rejected, but the value of 23.44, when r ≤ 3 is not rejected at the level of significance of 10% for male, and the value of 68.05, when r ≤ 1 is rejected but the value of 35.72, when r ≤ 2 is not rejected at the same level of significance for female. Similarly, we can determine that there are 3 co-integration effects for both males and females in Korea. Refer to Pfaff (2008) for more explanation.

The co-integration results for the 5 accuracy tests are in Table 3. The number in Table 3 is the number of co-integrations. Male in the U.S. and both genders in Korea have 2~3 co-integrations and female in the U.S. has 1~2 co-integrations. Since co-integrations exist for all the tests, the vector error correction model (VECM) should be used and the function vec2var() in R package is used for each test to transform VECM which is the object of the formal class generated by the function ca.jo() into the VAR representation. In Section 4.3 this model is expressed as ‘6-PFM’ instead of ‘6-PFM with VAR’. In the same manner, 4-PFM and 5-PFM, which use VAR, are expressed as ‘4-PFM’ and ‘5-PFM’, respectively.

For LSTM, an algorithm should be set to learn the mapping function from the input (training dataset) to the output (test dataset). For example, a time series data 1, 2, 3, 4, 5, . . ., can be transformed into two separate sets X and Y.

X:(1,2,3,4),(2,3,4,5),(3,4,5,6),,Y: 듼 듼 듼(5,6), 듼 듼 듼(6,7), 듼 듼 듼(7,8),,

where X is a set of input components, and Y is a set of output components. An algorithm learns the output pattern Y from the input pattern X. This is called a sliding window transformation as it is just like sliding a window across the prior observations that are used as inputs to the model in order to forecast the next value in the series. In this case, the window width is 4 time steps.

To forecast the central death rates from LSTM, the time dependence removal of the data is needed by subtracting the previous observations from the current one. Without differentiating , the forecast results may be too sensitively changed to the number of epochs (the number of passes of the entire training dataset the machine learning algorithm completes) or to the values of the regularizing parameter. The differentiating can reduce the instability of the time series data, which leads to stable results. After differentiating, the log central death rates are transformed into a supervised learning format (sliding window transformation) so that the model can learn to fit the target values. The transformed data is inverted into the original scale after the forecast is completed and then it computes RMSE to measure the accuracy. There are 6 factors to be forecasted, and the difference is taken between consecutive observations (lag 1 difference) for each factor. Then, lag 1 differences are transformed into the supervised learning format. We use the window width of 7-time steps in the training data set.

As an example for test 2, the time series of the 6 factors are transformed into two separate sets, training set (X) and test set (Y) as follows,

X:(d1,1d1,2····d1,7d2,1d2,2····d2,7d3,1d3,2····d3,7d4,1d4,2····d4,7d5,1d5,2····d5,7d6,1d6,2····d6,7),(d1,2d1,3····d1,8d2,2d2,3····d2,8d3,2d3,3····d3,8d4,2d4,3····d4,8d5,2d5,3····d5,8d6,2d6,3····d6,8),(d1,3d1,4····d1,9d2,3d2,4····d2,9d3,3d3,4····d3,9d4,3d4,4····d4,9d5,3d5,4····d5,9d6,3d6,4····d6,9),,Y: 듼 듼 듼(d1,8d1,9d2,8d2,9d3,8d3,9d4,8d4,9d5,8d5,9d6,8d6,9), 듼 듼 듼(d1,9d1,10d2,9d2,10d3,9d3,10d4,9d4,10d5,9d5,10d6,9d6,10), 듼 듼 듼(d1,10d1,11d2,10d2,11d3,10d3,11d4,10d4,11d5,10d5,11d6,10d6,11), 듼 듼 듼,

where di, j is a value which subtracts factor i, j−1 from factor i, j, i is the index of the ith factor, and j is the year.

The number of epochs is set to be 10,000 and the input unit is set to be 42 (=6 × 7) for all the tests, and the output unit is set to be 6 (=6 × 1) for test1, 12 (=6 × 2) for test 2, . . ., 30 (=6 × 5) for test 5. For the regularized LSTM, the regularizing parameter λ of 0.05 is used. In Section 4.3 these two models are expressed as ‘6-PFM_LSTM’ and ‘6-PFM_reg-LSTM’, respectively. A main part of the python code is in the Appendix.

4.3. Test results

The results of the accuracy tests are in Table 4. The upper two parts are the average RMSEs of the U. S. male and female groups and the lower two parts are the average RMSEs of the Korean male and female groups.

For males in the U.S., 6-PFM performs best with the average RMSE of 0.0847, followed by 6-PFM_reg-LSTM, 6-PFM_LSTM, the LC model, 5-PFM, and 4-PFM . For females, 6-PFM_reg-LSTM shows the best performance, followed by 6-PFM_LSTM, 6-PFM, 5-PFM, 4-PFM, and the LC model, in this order. In a comparison between males and females, performances for males are better for the LC model and 6-PFM, and performances for females are better for 4-PFM, 5-PFM, 6-PFM_LSTM, and 6-PFM_reg-LSTM. Differences of average RMSEs between genders are 0.014, 0.076, 0.068, 0.004, 0.007, and 0.006 for the LC model, 4-PFM, 5-PFM, 6-PFM, 6-PFM_LSTM, and 6-PFM_reg-LSTM, respectively. Thus the 6-PFMs show more stable results than the LC model, 4-PFM, and 5-PFM.

For males in Korea, 6-PFM_reg-LSTM performs best with the average RMSE of 0.1026, followed by 6-PFM_LSTM, the LC model, 6-PFM, 5-PFM, and 4-PFM, in this order. For females, 6-PFM_reg-LSTM is the best model as in males. 6-PFM is the next best, and 4-PFM shows the worst performance. All the models work better for males than for females. Differences in the average RMSEs between genders are 0.061, 0.064, 0.063, 0.040, 0.046, 0.041 for the LC model, 4-PFM, 5-PFM, 6-PFM, 6-PFM LSTM, and 6-PFM_reg-LSTM, respectively. Thus, 6-PFMs show more stable results than the other models as in the U.S.

For both countries, 6-PFMs performs better than the LC model, 4-PFM, and 5-PFM. Among the three 6-PFMs, 6-PFM_reg-LSTM is the best for both countries except for the U.S. male. When comparing two countries, there is no consistency for the LC model, 4-PFM, and 5-PFM, but there is a consistency for the three 6-PFMs. The 6-PFMs work better for the U.S. life-tables than the Korea life-tables.

One abnormal phenomena is found in the Korean female group. The average RMSEs in the Korean female group show decreasing trends as the forecasting period increases, which is not normal because longer-term forecasting should be more difficult than short-term forecasting. Maybe coincidence of the test results has occurred, as I mentioned in Section 4.1, in which unlucky large errors have occurred in the short-term forecasting and lucky small errors have occurred in the longer-term forecasting in the Korean female group.

4.4. Mortality forecasts

The accuracy test results show that 6-PFM_reg-LSTM was the most accurate model for both the United States and Korea. Thus, mortality forecasts were performed for 6-PFM_reg-LSTM and the results were compared with the LC model. The future years to which the models are forecasted are 2030 and 2040. The future years are restricted to 2040 due to the short periods of the past mortality data in Korea. The number of epochs is set to be 10,000 and the regularizing parameter λ is set to be 0.05 as in the accuracy tests. The input unit is set to be 90 (=6 × 15).

The results are shown in terms of life expectancy as in Table 5. The future life expectancies of Korea are 4.16~6.34 years longer than those of the United States. When the two models are compared, 6-PFM_reg-LSTM forecasts shorter life expectancies for males than the LC model but forecasts those of females longer than the LC model for both countries. Specifically, for the United States, the 6-PFM_reg-LSTM shows that the future life expectancies in 2030 would be 76.51 and 81.59 for males and females, respectively, which are shorter life expectancy results than those of the LC model. The model also shows that the life expectancies in 2040 would be 76.55 and 81.63 for males and females, respectively, which are shorter life expectancy results than those of the LC model for males, but longer than the LC model for females.

It is noted that the life expectancies of the U.S. in 2040 are slightly improved from those in 2030 compared with Korea. For Korea, the 6-PFM_reg-LSTM shows that the life expectancies for males in 2030 would be 80.67, which is a shorter life expectancy result than the results of the LC model, but for females, the life expectancy would be 86.81, which is a longer life expectancy result than the results of the LC model. It also shows that the life expectancy for a male in 2040 would be 81.04, which is a quite shorter life expectancy result than the results of the LC model, but for a female, the life expectancy would be 87.37, which is a longer life expectancy result than the results of the LC model. Therefore, the future life expectancies might be shorter for males but longer for females than we expect since the standard model used for forecasting in both countries is the LC model.

5. Conclusion

6-PFM was developed in an effort to increase the accuracy of the mortality forecast by adding two factors to 4-PFM. With the added two factors, 6-PFM has shown better performance than 4-PFM and 5-PFM as expected. It was also shown that 6-PFM performed better than the LC model in most of the accuracy tests. Among the 3 forecasting methods of 6-PFM, regularized LSTM was shown to have the best performance for both countries except for the U.S. male group. Therefore LSTM is strongly recommended to be adapted when developing forecasting models.

In this paper, the fixed parameter of 0.05 for regularized LSTM was used, but how to determine the appropriate value of the parameter was not developed. The model could perform better if it could be fine-tuned by regularization. Developing fine-tuning methods is proposed for future research.

Appendix
Main part of Python code for LSTM
 뼧 model = Sequential()
뼧 model.add (LSTM(n_periods_in*n_features, activation=’relu’, input_shape=
(n_periods_in, n_features),x activity_regularizer=tf.keras.regularizers.
L1(l1=\lambda))) 뼧 model.add (RepeatVector(n_periods_out))
뼧 model.add (LSTM(n_periods_out*n_features, activation=’relu’, return_sequences=True))
뼧 model.add (TimeDistributed(Dense(n_features)))
뼧 model.compile (optimizer=’adam’, loss=’mse’)
뼧 hist = model.fit(X, Y, epochs=10,000, verbose=0)
뼧 yhat = model.predict(x_input, verbose=0)

where n_periods_in is 7, n_periods_ out is the number of forecasting periods, and n_features is the number of factors which is 6. The regularizing parameter λ is set to be l1=0.0 and l1=0.05 for LSTM and reg_LSTM, respectively, inside of the () in t f.keras.regularizers.L1().

Figures
Fig. 1. Histogram of : above (U.S.) - male(left), female(right) below (Korea) - male(left), female(right).
Fig. 2. Shapes of the loading functions in U.S.: above-male, below-female.
Fig. 3. Shapes of the loading functions in Korea: above-male, below-female.
Fig. 4. Trends of 6 factors from yr1933 to yr2018 in U.S.: left-male, right-female.
Fig. 5. Trends of 6 factors from yr1983 to yr2019 in Korea: left-male, right-female.
Fig. 6. Structure of LSTM.
TABLES

Table 1

The periods for model input and output for 5 tests

CountryTestInput periodsOutput periods
U.S.test11933~20172018
test21933~ 20162017~2018
test31933~ 20152016~2018
test41933~ 20142015~2018
test51933~ 20132014~2018

Kortest11983~20182019
test21983~ 20172018~2019
test31983~ 20162017~2019
test41983~ 20152016~2019
test51983~ 20142015~2019

Table 2

Results of Johansen test of 6-PFM for test 1

# co-integration (r)Test (US)Test (Kor)Significance level

MaleFemaleMaleFemale10%5%1%
r ≤ 50.080.034.951.456.508.1811.65
r ≤ 47.232.3112.906.8215.6617.9523.52
r ≤ 323.4415.0922.0023.4528.7131.5237.22
r ≤ 245.3135.7254.1146.2045.2348.2855.43
r ≤ 189.2968.05101.9779.4866.4970.6078.87
r = 0148.64103.46157.62160.0485.1890.39104.20

Table 3

The co-integration test results of 6-PFM for all 5 accuracy tests

ContryGendertest 1test 2test 3test 4test 5
U.S.Male32333
Female21111

KoreaMale32223
Female33322

Table 4

The results of accuracy tests

CountryGenderForecasting periodLC4-PFM5-PFM6-PFM6-PFM_LSTM6-PFM_reg-LSTM
U.S.Male1 year0.14870.17640.15110.07030.07780.0652
2 years0.15300.18170.15540.08290.07330.0707
3 years0.15840.19210.18590.08000.08540.0929
4 years0.16460.20100.18180.08570.10220.1070
5 years0.16750.21000.19420.10490.11790.1122

FemaleAverage0.15850.19220.17370.08470.09130.0896
1 year0.16540.10060.08990.06540.06560.0612
2 years0.16890.10690.09180.06930.06300.0640
3 years0.17100.11200.09360.07470.08860.0874
4 years0.18100.12300.12230.10790.10650.1058
5 years0.17730.13670.13240.12600.09560.1005
Average0.17270.11590.10600.08870.08390.0838

Rep. of KorMale1 year0.10700.10950.10280.09660.09930.0983
2 years0.10030.11330.10280.08940.09290.0918
3 years0.11000.11730.11410.10000.09970.0963
4 years0.11430.12190.12180.12180.10390.1018
5 years0.12500.16250.17270.15060.13320.1248
Average0.11130.12490.12280.11160.10580.1026

Female1 year0.17950.19230.18580.16200.18920.1593
2 years0.17760.18220.17920.15200.16740.1611
3 years0.16710.19000.18650.15460.14060.1373
4 years0.17420.19980.19600.14290.13700.1365
5 years0.16510.17750.18110.14730.12620.1246
Average0.17270.18840.18570.15180.15210.1438

Table 5

The projected life expectancies

CountryGenderYearLC model6-PFM_reg-LSTM
U.S.Male203077.3576.51
204078.6176.55
Female203080.7881.59
204080.8881.63

KorMale203082.6180.67
204084.9581.04
Female203085.7386.81
204086.2887.37

References
  1. Azuma Y (2020). Core deep-learning introduction, SB Creative Corp.
  2. Booth H, Hyndman RJ, Tickle L, and De Jong P (2006). Lee-Carter mortality forecasting: A multi-country comparison of variants and extensions. Demographic Research, 15, 289-310.
    CrossRef
  3. Booth H, Maindonald J, and Smith L (2002). Applying Lee-Carter under conditions of variable mortality decline. Population Studies, 56, 325-336.
    CrossRef
  4. Booth H and Tickle L (2008). Mortality modelling and forecasting: A review of methods. Annals of Actuarial Science, 3, 3-43.
    CrossRef
  5. Cairns A, Blake D, and Dowd K (2006). A Two-Factor Model for Stochastic Mortality with Parameter Uncertainty: Theory and Calibration. The Journal of Risk and Insurance, 73, 687-718.
    CrossRef
  6. Choi J (2021). Comparison of accuracy between LC model and 4-parametric factor model when COVID-19 impacts mortality structure. Communications for Statistical Applications and Methods, 28, 233-250.
    CrossRef
  7. Chung J, Culcehre C, Cho K, and Bengio Y (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling.
  8. Diebold FX and Li C (2006). Forecasting the term structure of government bond yields. Journal of Econometrics, 130, 337-364.
    CrossRef
  9. Gers F, Schmidhuber J, and Cummins F (2000). Learning to forget: continual prediction with LSTM. Neural Computation, 12, 2451-2471.
    Pubmed CrossRef
  10. Gompertz B (1825). On the nature of the function expressive of the law of human mortality, and on a new mode of determining the value of life contingencies. Philosophical Transactions of the Royal Society of London, 115, 513-583.
    CrossRef
  11. Graves A, Mohamed A-r, and Hinton G (2013). Speech recognition with deep recurrent neural networks. IEEE International Conference on Acoustics, Speech, and Signal Processing. .
  12. Haldrup N and Rosenskjold C (2019). Econometrics.
  13. Heligman L and Pollard JH (1980). The age pattern of mortality. Journal of the Institute of Actuaries, 107, 49-80.
    CrossRef
  14. Hastie T, Tibshirani R, and Friedman J (2009). The elements of statistical learning: data mining, inference, and prediction. Springer Series in Statistics.
  15. Hochreiter Sepp and Schmidhuber J체rgen (1997). long short-term memory. Neural Computation, 9, 1735-1780.
    Pubmed CrossRef
  16. () Human mortality data . Retrieved from Dec 31, 2020https://www.mortality.org/
  17. Hyndman RJ and Ullah MDS (2007). Robust forecasting of mortality and fertility rates: a functional data approach. Computational Statistics and Data Analysis, 51, 4942-4956.
    CrossRef
  18. Jozefowicz R, Zaremba W, and Sutskever L (2015). An empirical exploration of recurrent network architectures. Proceedings of ICML, 2342-2350.
  19. Lee RD and Carter LR (1992). Modeling and forecasting U.S. mortality. Journal of the American Statistical Association, 87, 659-671.
  20. Nan Li and Lee R (2005). Coherent mortality forecasts for a group of populations: An extension of the Lee-Carter method. Demography, 42, 575-594.
    CrossRef
  21. Merity S, Keskar NS, and Socher R (2017). Regularizing and optimizing LSTM language models.
  22. Nigri A, Levantesi S, Marino M, Scognamiglio S, and Perla F (2019). A deep learning integrated Lee-Carter model. Risks, 7.
    CrossRef
  23. Perla F, Richman R, Scognamiglio S, and Wuthrich M (2021). Time-Series Forecasting of Mortality Rates using Deep Learning. Scandinavian Actuarial Journal, 2021, 572-598.
    CrossRef
  24. Pfaff B (2008). VAR, SVAR, and SVEC Models: implementation within R Package vars. Journal of Statistical Software, 27, 1-32.
    CrossRef
  25. Raschka S and Mirjalili V (2017). Python machine learning (2nd ed), Packt Publishing.
  26. Renshaw AE and Haberman S (2006). A cohort-based extension to the Lee-Carter model for mortality reduction factors. Insurance: Mathematics and Economics, 38, 556-570.
  27. Richman R and W체thrich MV (2019). Lee and Carter go machine learning: recurrent neural networks.
  28. Rogers A and Little JS (1994). Parameterizing age patterns of demographic rates with the multiex-ponential model schedule. Mathematical Population Studies, 4, 175-195.
    CrossRef
  29. Siler W (1979). A competing-risk model for animal mortality. Ecology, 60, 750-757.
    CrossRef
  30. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, and Salakhutdinov R (2014). Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Rresearch, 15, 1929-1958.
  31. Statistics Korea () Life-table . Retrieved from Dec 31, 2020https://kosis.kr/statisticsList/statisticsListIndex.do?vwcd=MT_ZTITLE&menuId=M_01_01#content-group
  32. United Nations (2019) World Population Prospects 2019 . Retrieved from Apr 01, 2021http://population.un.org/wpp
  33. Wi힄niowski A, Smith PWF, Bijak J, Raymer J, and Forster JJ (2015). Bayesian population forecasting: extending the Lee Carter method. Demography, 52, 1035-1059.
    CrossRef