
This paper analyzes yearly death counts after World War II of 8 countries in three regions, North America, Europe, and Asia-Pacific region to identify and to compare stochastic structures of death counts. The 8 countries are the United States and Canada in North America, United Kingdom, France, Italy, and Spain in Europe, and Taiwan and Australia in Asia-Pacific region. Death counts are from the year 1946 or 1970 (depending on availability) and are separated by gender and total counts to see whether gender influences the stochastic structures. The structural time series models (Harvey, 1981, 1989) assume that a time series can be formulated directly with the unobserved components such as trend, slope, seasonal, cycle, and daily effect. The random effect of each unobserved component is characterized by its stochastic structure and a distribution of its irregular component. Structural time series models that this paper entertained are a local level with a random walk model, a fixed local linear trend model, and a local linear trend model. These models are sensible choices based on the preliminary examination of death counts data. Structural time series models use the Kalman filter (Kalman, 1960) to estimate unknown parameters of the entertained model, forecast future values of time series, and estimate unobserved components in the stochastic model by filtering and smoothing. To apply the Kalman filter, the structural time series models need to be converted to a state space form (Durbin and Koopman, 2012) that is the standardized form for the Kalman filter. To check the validity, two diagnostic procedures are used for a fitted model: One is for checking the normality of residuals, for which the Shapiro-Wilk test is used and the normal QQ plot and density plot are used to confirm the results of the Shapiro-Wilk test. Second is for checking the independence of residuals, for which the Run test is used. To find the best-fitted model among valid models, Akaike information criterion (AIC), Bayesian information citerion (BIC) and sum of square of one-step-ahead prediction errors (SSPE) are used. The best-fitted valid models of death counts for each country by female, male and total are fully examined to see any differences or similarities among countries and regions. The organization of this paper is as follows. In Section 2, the structural time series model is presented. It also shows how to set up the state space form for each structural time series model entertained. In Section 3, the Kalman filter is introduced. This section also shows which R packages to use for the Kalman filter. In Section 4, results of analyzing death counts of 8 countries by female, male and total are presented. Finally, Section 5 concludes the paper.
The structural time series models assume that a time series can be formulated directly with the unobserved components that are characterized by its own stochastic structure and an irregular term. By varying its own stochastic structure and the distribution of an irregular term of an unobserved component, structural times series models can fit a variety of time series in many fields. Outcomes of a structural time series model are estimates of unknown parameters, forecasts of future values of time series, and estimates of unobserved components in the model. Estimates of unobserved components of a model give an insight to fully understand a stochastic structure of a time series of interest. There are many papers regarding the structural time series model. For example, Harvey and Todd (1983) compared the structural time series model with Box and Jenkins’ ARIMA model. Harvey and Peters (1990) showed the number of methods to compute the maximum likelihood estimator of unknown parameters of the structural time series model.
For death counts data, three structural time series models are entertained: local level with random walk model, fixed local linear trend model, and local linear trend model. Choices of these models are based on the examinations of plots of death counts of 8 countries by female, male and total. Plots of death counts of all 8 countries by female, male and total are presented in Section 4.
First structural time series model entertained in this paper is the local level with a random walk (LLRW) model:
where
Second model entertained is the local linear trend (LT) model:
where
Third model is the fixed local linear trend (FT) model that is a variation of LT model:
where
Structural time series model uses the Kalman filter to estimate unknown parameters, to forecast future values and to estimate unobserved components. To use the Kalman filter, a state space form (Durbin and Koopman, 2012) is required. The state space form is a standardized form of a stochastic model as an input for the Kalman filter. For the LLRW model (
The state space form (
The state space form (
Kalman filter has been applied to many fields since Kalman’s first paper (Kalman, 1960) was published. Harrison and Stevens (1971) was the first paper to apply the Kalman filter to a time series analysis. Since then, the Kalman filter has been used to analyze several time series models such as ARIMA models (Box and Jenkins, 1976), structural time series models (Harvey, 1989), and ARMAX models (Hannan and Deistler, 1988). Some areas where it has been applied include disease control (Gove and Houston, 1996), actuary claim reserves forecasting (Chukhrova and Johannssen, 2017), rain fall forecasting (Asemota
Kalman filter can be applied to either univariate and multivariate time series, and to either time variant structure or time invariant structure of time series. Time series data,
where
where
where
The state space form of the LT model (
The state space form of the FT model (
The Kalman filter has three assumptions: A1) the initial state vector,
Given the information of initial state variable,
The information required to start off the Kalman filter is mean and covariance matrix of the initial state vector,
In the state space form in (
To analyze our data by the Kalman filter, a R function,
Yearly death counts data of 8 countries analyzed in this paper are extracted from the Human Mortality Database (HMD). HMD provides several data such as death counts, census counts, birth counts, and population estimates for calculations of death rates and life tables. The main goal of the HMD is to document the longevity revolution of the modern era and to facilitate research in its causes and consequences. HMD includes relatively wealthy and highly industrialized countries since it is based on design to populations where death registration and census data are virtually complete. In this paper, death counts data of 8 countries are analyzed: Two countries in North America (U.S. and Canada), four countries in Europe (U.K., France, Italy, and Spain), two countries in Asia-Pacific (Taiwan and Australia). From death counts from HMD, three death counts data are generated for each country: Female death counts, Male death counts, and Total death counts. The annual periods of data for countries are for U.S. (1946–2017), Canada (1946–2016), U.K. (1946–2016), France (1946–2017), Spain (1946–2018), Italy (1946–2017), Australia (1946–2018), and Taiwan (1970–2014). HMD keeps death counts data for Taiwan only from the year 1970. This paper uses data after World War II (1939–1945) since there are serious outliers during World War II, especially for data of countries in Europe.
All three models, LLRW, FT, and LT are fitted to three types of death counts (female, male and total) of all 8 countries. For each model fitted, two diagnostic procedures based on residuals are applied to see the validity of the fitted model. The residuals in the Kalman filter are obtained from one-step-ahead-prediction errors (also called innovations). One-step-ahead-prediction errors are obtained by the Kalman filter with the estimated values of unknown parameters of the fitted model. Standardized residuals are obtained from one-step-ahead-prediction errors divided by the standard deviation of one-step-ahead-prediction errors. If a fitted model is valid, then the standardized residuals are independent and identically distributed by a normal distribution (Harvey, 1989). Thus, the first diagnostic procedure is to check the normality of the standardized residuals and the second diagnostic procedure is to check the independence of the standardized residuals. Shapiro-Wilk test is used to check the normality, and normal QQ plots and density plots are used to confirm the conclusions of Shapiro-Wilk test. Run test is used to check the independence of the standardized residuals. Models that pass two diagnostics procedures are treated as the valid ones. Once the valid models are identified for each type of data after the diagnostic procedures, a best-fitted valid model is selected based on AIC (Akaike Information Criterion), BIC (Bayesian Information Criterion) and SSPE (Sum of Square of one-step-ahead-prediction errors). If these three criteria do not recommend a model unanimously, then the most recommended one is selected as the best-fitted valid model.
Figure 1 shows plots of death counts of female, male, and total for U.S. and Canada. Both U.S. and Canadian data show a similar linear trend for death counts of female, male and total. Also, both countries’ data show that death counts for males are larger than those of females up to the year 2000. Since then, death counts of females and males are close to each other. Since both countries’ data show a clear linear trend, it is sensible to fit the LT and FT model. The LT model assumes that the slope of linear trend is stochastic, and the FT model assumes that it is deterministic. Outcomes of the LLRW model are also provided to compare with them. Table 1 shows
Figure 4 and 7 show plots of death counts of female, male and total for U.K. and France, and Italy and Spain, respectively. Deaths of European countries do not show a clear linear trend that deaths of North American countries show. Table 5 shows
Table 9 shows
Figure 11 shows plots of death counts of female, male and total deaths for Taiwan and Australia. Deaths of these two countries show a clear linear trend similar to deaths of the two North American countries mentioned above. Table 13 shows
One advantage to fit a structural time series model using the Kalman filter is that unobserved components of the time series can be estimated. Both the LLRW and the FT model have an unobserved trend component,
This paper analyzes female, male and total death counts of 8 countries in several regions in the world using three structural time series models with the Kalman filter. Three structural models are a local level with a random walk (LLRW) model, a fixed local linear trend (FT) model and a local linear trend (LT) model. LLRW model implies that the level of data moves stochastically based on a random walk model. Thus, this model is good to fit data without a clear linear pattern. FT model implies that the level of data moves stochastically with a deterministic slope. Thus, this model is good to fit data with a clear linear pattern. LT model implies that the level of data moves stochastically with a stochastic slope. That is, both level and slope move stochastically and thus this model is the most flexible. Thus, this model is good to fit data both with and without a clear linear pattern. Death counts of all three types of deaths in two North American countries and two Asia-Pacific countries show similar linear trends. Best fitted stochastic models for both North American countries and Asia-Pacific countries are either FT or LT model. Table 17 shows the signal to noise ratios for trend (VarN/VarE) and those of slope (VarK/VarE) of the best-fitted stochastic models for two North American countries. For the U.S., female deaths show higher signal to noise ratio for trend than that of male deaths. For Canada, however, female deaths show lower signal to noise ratio for trend than that of male deaths. Signal to noise ratios of slope are small for both countries. Note that NA in the table is for the FT model where VarK is not presented. Table 18 shows the signal to noise ratios for two Asia-Pacific countries. It shows that two countries do not have high signal to noise ratios for trend. Male deaths of both countries have higher signal to noise ratios than those of female death. Taiwan has low signal to noise ratios of slope as well. European countries do not show any clear linear trend. Best fitted stochastic models for European countries are either LLRW or LT. Table 19 and 20 show that European countries have low signal to noise ratios for both trend and slope.
Table 4, 8, 12 and 16 show the estimates of values of variances of irregular components, VarN, VarK and VarE. These variances show how much each component in the structural model moves stochastically up and down. For example, value of VarN shows how much the level of the trend,
In Table 18 through 20, value of signal to noise ratio for trend (VarN/VarE) of LT model is zero for female model of Taiwan, male and total model of U.K., female and total model of Italy, and all models of Spain. For these cases, LT model (
This adjusted model implies that the level of the trend at time
U.S. | Canada | |||||
---|---|---|---|---|---|---|
LLRW | FT | LT | LLRW | FT | LT | |
Female | 0.936 | 0.666 | 0.717 | 0.724 | 0.610 | 0.723 |
Male | 0.400 | 0.585 | 0.659 | 0.578 | 0.406 | 0.198 |
Total | 0.930 | 0.956 | 0.993 | 0.867 | 0.916 | 0.603 |
U.S. | Canada | |||||
---|---|---|---|---|---|---|
LLRW | FT | LT | LLRW | FT | LT | |
Female | 0.042 | 0.549 | 0.631 | 0.809 | 0.336 | 0.543 |
Male | 0.281 | 0.281 | 0.998 | 0.471 | 0.471 | 0.395 |
Total | 0.281 | 0.904 | 0.998 | 0.809 | 0.809 | 0.543 |
Best models for death counts of U.S. and Canada
U.S. | Canada | |||||
---|---|---|---|---|---|---|
AIS | BIS | SSPE | AIS | BIS | SSPE | |
Female | FT | FT | FT | LT | LT | LT |
Male | LT | LT | LT | FT | FT | LT |
Total | FT | FT | LT | LT | FT | LT |
Estimates of parameters for death counts of U.S. and Canada
U.S. | Canada | ||||||
---|---|---|---|---|---|---|---|
VarN | VarK | VarE | VarN | VarK | VarE | ||
Female by FT | 1.922503E+08 | NA | 3.115608E+07 | Female by LT | 5.618963E+05 | 1.730798E+04 | 3.000166E+05 |
Male by LT | 5.625876E+07 | 1.459582E+07 | 7.079134E+07 | Male by FT | 1.437916E+06 | NA | 4.166670E+04 |
Total by FT | 7.584381E+08 | NA | 1.080131E+08 | Total by LT | 3.156514E+06 | 3.567977E+04 | 7.245420E+05 |
U.K. | France | |||||
---|---|---|---|---|---|---|
LLRW | FT | LT | LLRW | FT | LT | |
Female | 0.999 | 0.995 | 0.869 | 0.819 | 0.553 | 0.489 |
Male | 0.829 | 0.902 | 0.345 | 0.377 | 0.251 | 0.306 |
Total | 0.996 | 0.969 | 0.807 | 0.802 | 0.448 | 0.314 |
U.K. | France | |||||
---|---|---|---|---|---|---|
LLRW | FT | LT | LLRW | FT | LT | |
Female | 0.471 | 0.471 | 0.902 | 0.719 | 0.719 | 0.631 |
Male | 0.092 | 0.809 | 0.902 | 0.073 | 0.073 | 0.054 |
Total | 0.809 | 0.809 | 0.543 | 0.401 | 0.188 | 0.631 |
Best models for death counts of U.K. and France
U.K. | France | |||||
---|---|---|---|---|---|---|
AIS | BIS | SSPE | AIS | BIS | SSPE | |
Female | LT | LT | LLRW | LLRW | LLRW | LLRW |
Male | LT | LT | LLRW | LLRW | LLRW | LLRW |
Total | LT | LT | LLRW | LLRW | LLRW | LLRW |
Estimates of parameters for death counts of U.K. and France
U.K. | France | ||||||
---|---|---|---|---|---|---|---|
VarN | VarK | VarE | VarN | VarK | VarE | ||
Female by LT | 6.732207E+06 | 2.012691E+05 | 4.868096E+07 | Female by LLRW | 1.462518E+07 | NA | 4.612672E+07 |
Male by LT | 0.000000E+00 | 5.816742E+05 | 3.787161E+07 | Male by LLRW | 9.614802E+06 | NA | 2.995601E+07 |
Total by LT | 0.000000E+00 | 2.140323E+06 | 1.745933E+08 | Total by LLRW | 4.473024E+07 | NA | 1.475258E+08 |
Italy | Spain | |||||
---|---|---|---|---|---|---|
LLRW | FT | LT | LLRW | FT | LT | |
Female | 0.689 | 0.303 | 0.168 | 0.019 | 0.124 | 0.503 |
Male | 0.453 | 0.089 | 0.129 | 0.000* | 0.021 | 0.119 |
Total | 0.519 | 0.215 | 0.098 | 0.002* | 0.066 | 0.278 |
*Significance with 0.01
Italy | Spain | |||||
---|---|---|---|---|---|---|
LLRW | FT | LT | LLRW | FT | LT | |
Female | 0.402 | 0.188 | 0.336 | 0.812 | 0.476 | 0.719 |
Male | 0.719 | 0.719 | 0.631 | 0.476 | 0.476 | 0.719 |
Total | 0.402 | 0.188 | 0.336 | 0.476 | 0.235 | 0.719 |
*Significance with 0.01
Best models for death counts of Italy and Spain
Italy | Spain | |||||
---|---|---|---|---|---|---|
AIS | BIS | SSPE | AIS | BIS | SSPE | |
Female | LT | LT | LLRW | LT | LT | LLRW |
Male | LLRW | LLRW | LLRW | LT | LT | LLRW |
Total | LT | LLRW | LLRW | LT | LT | LLRW |
Estimates of parameters for death counts of Italy and Spain
Italy | Spain | ||||||
---|---|---|---|---|---|---|---|
VarN | VarK | VarE | VarN | VarK | VarE | ||
Female by LT | 0.000000E+00 | 7.834581E+05 | 4.880840E+07 | Female by LT | 0.000000E+00 | 2.607934E+05 | 1.998147E+07 |
Male by LLRW | 3.068192E+07 | NA | 2.589466E+07 | Male by LT | 0.000000E+00 | 5.494487E+05 | 2.062101E+07 |
Total by LLRW | 1.113686E+08 | NA | 1.315861E+08 | Total by LT | 0.000000E+00 | 1.511941E+06 | 7.905366E+07 |
Taiwan | Australia | |||||
---|---|---|---|---|---|---|
LLRW | FT | LT | LLRW | FT | LT | |
Female | 0.623 | 0.570 | 0.939 | 0.243 | 0.243 | 0.358 |
Male | 0.623 | 0.862 | 0.916 | 0.265 | 0.187 | 0.483 |
Total | 0.450 | 0.655 | 0.938 | 0.167 | 0.189 | 0.198 |
*Significance with 0.01
Taiwan | Australia | |||||
---|---|---|---|---|---|---|
LLRW | FT | LT | LLRW | FT | LT | |
Female | 0.361 | 0.127 | 0.278 | 0.018 | 0.018 | 0.402 |
Male | 0.127 | 0.761 | 0.442 | 0.342 | 0.998 | 0.719 |
Total | 0.761 | 0.361 | 0.641 | 0.154 | 0.342 | 0.402 |
*Significance with 0.01
Best models for death counts of Taiwan and Australia
Taiwan | Australia | |||||
---|---|---|---|---|---|---|
AIS | BIS | SSPE | AIS | BIS | SSPE | |
Female | LT | LT | FT | FT | FT | LT |
Male | LT | FT | FT | FT | FT | LLRW |
Total | LT | LT | LT | FT | FT | LLRW |
Estimates of parameters for death counts of Taiwan and Australia
Taiwan | Australia | ||||||
---|---|---|---|---|---|---|---|
VarN | VarK | VarE | VarN | VarK | VarE | ||
Female by LT | 0.000000E+00 | 2.669466E+04 | 3.679238E+05 | Female by FT | 7.231256E+05 | NA | 7.907837E+05 |
Male by FT | 1.282648E+06 | NA | 3.192780E+05 | Male by FT | 9.721123E+05 | NA | 7.088836E+05 |
Total by LT | 1.765875E+05 | 1.316371E+05 | 2.106885E+06 | Total by FT | 3.093034E+06 | NA | 2.932334E+06 |
Signal to Noise Ratio for U.S. and Canada
U.S. | Canada | ||||
---|---|---|---|---|---|
VarN/VarE | VarK/VarE | VarN/VarE | VarK/VarE | ||
Female by FT | 6.171 | NA | Female by LT | 1.873 | 0.058 |
Male by LT | 0.795 | 0.206 | Male by FT | 34.510 | NA |
Total by FT | 7.022 | NA | Total by LT | 4.357 | 0.049 |
Signal to Noise Ratio for Taiwan and Australia
Taiwan | Australia | ||||
---|---|---|---|---|---|
VarN/VarE | VarK/VarE | VarN/VarE | VarK/VarE | ||
Female by LT | 0.00 | 0.07 | Female by FT. | 0.91 | NA |
Male by FT | 4.02 | 0.04 | Male by FT | 1.37 | NA |
Total by LT | 0.08 | 0.06 | Total by FT | 1.05 | NA |
Signal to Noise Ratio for U.K. and France
U.K. | France | ||||
---|---|---|---|---|---|
VarN/VarE | VarK/VarE | VarN/VarE | VarK/VarE | ||
Female by LT | 0.14 | 0.00 | Female by LLRW | 0.32 | NA |
Male by LT | 0.00 | 0.02 | Male by LLRW | 0.32 | NA |
Total by LT | 0.00 | 0.01 | Total by LLRW | 0.30 | NA |
Signal to Noise Ratio for Italy and Spain
Italy | Spain | ||||
---|---|---|---|---|---|
VarN/VarE | VarK/VarE | VarN/VarE | VarK/VarE | ||
Female by LT | 0.00 | 0.02 | Female by LT | 0.00 | 0.01 |
Male by LLRW | 1.18 | NA | Male by LT | 0.00 | 0.03 |
Total by LLRW | 0.85 | NA | Total by LT | 0.00 | 0.02 |