Household projection means forecasting future household sizes and types based on future population projections, while considering recent trends in household changes. Households serve as the smallest common living group in society and fulfilling the function of population reproduction. Economically, they represent the basic unit of consumption, as the foundational unit for the supply and demand in the housing market and for durable goods. Therefore, information on both the quantitative and qualitative changes in households is essential for formulating social and economic government policies.
The pace of changes in household structures is accelerating across different times and countries (Hu and Peng, 2015; Jacobsen
However, due to both internal and external challenges in the survey environment and increasing difficulty and costs associated with traditional censuses, several countries, including Denmark, Sweden, Australia, and Singapore, have transitioned to a register-based census. Unlike traditional censuses where surveyors personally visit households, a register-based census generates statistics using administrative data sources like resident registers, building registers, and immigration records (Park and Lee, 2017; Jun, 2020). Korea introduced its first register-based census in 2015, transitioning from a five-year general survey-based approach to an annual administrative data-based census.
The change in census not only shortens the data cycle but also provides more data points within the same time interval, allowing population trends and fluctuations to be captured over shorter intervals. While the register-based census effectively captures rapid shifts in Korea’s population and household structure, these quick changes can result in unrealistic projections when using the existing two-point modified exponential model for headship rate projection. In this paper, we introduce a new and robust household projection model that accurately reflects register-based census data and maintains stability in long-term projections, even in rapid and transient trend changes.
Household projections are generally categorized into two types: static methods, which rely on population and household distributions at a specific point in time, and dynamic methods, which account for changes in individuals and households over time. Depending on the data source, these methods can be macro, using census or registration data, or micro, employing samples or individual data (Wilson, 2013; van Imhoff
Given that the two-point exponential model relies on two censuses with extended time intervals, it is not ideally suited for register-based censuses that offer multiple data points over shorter intervals. Therefore, we aimed to develop a new household headship rate estimation model that can take into account household changes and utilize all the accumulated time series information, which is an advantage of the registration census, where multiple point-in-time data are accumulated with a short time frequency. In this paper, we propose an N-point modified exponential method (MEM), which are designed to reflect the dynamics of the household headship rate at various times using registration data, and consider three modifications: Weighted N-point MEM, regression-based N-point MEM, and rolling weighted N+point MEM.
Using these methods, we projected the future household headship rates by number of household members and by household type based on the 2016 to 2020 resident registration-based census statistics in Korea. First, we conduct long-term projections up to 2051 and compare these results with those from the existing two-point exponential model. Long-term projections over 20 years using the traditional two-point exponential model tend to produce divergent fluctuations and unstable projections. In contrast, the MEM methods reduce this divergence and instability. Particularly, the rolling weighted N+point MEM method ensures stable projection results.
Second, we assess the accuracy of the short-term projection using the 2021 and 2022 data from the updated actual register-based censuses. The results show that the MEM methods generally outperform the two-point exponential model. When the rate of change in the headship rate is consistently increasing or decreasing, the regression-based N-point MEM shows better performance compared to both the weighted N-point MEM and the rolling weighted N+point MEM.
The paper is organized as follows: After the introduction in Section 1, Section 2 describes the two-point exponential model and discusses the limitations of its previous extension to three points. Section 3 introduces the multipoint exponential models proposed in this paper, including the weighted N-point MEM, regression-based N-point MEM, and rolling weighted N+point MEM. Section 4 summarize the time series of the headship rate in Korea using register-based census data. Section 5 presents the results of household projections in Korea, comparing the performance of the MEM methods with the traditional two-point exponential model. The paper concludes with final remarks in Section 6.
Many countries, including England, Scotland, France, Canada, Japan, and Korea, have implemented future household projections based on headship rates. There are two primary methodologies for projecting these rates: One involves extrapolating changes linearly or exponentially into the future using mathematical formula; the other models the relationship between headship rates and variables such as socio-economic factors and government policies, developing scenarios for projection (Leiwen and O’Neill, 2004).
Many national statistical organizations favor the mathematical method becuase of its simplicity, intuitiveness and convenience. The two-point exponential model is the most representative. This model uses headship rates observed at the two most recent censuses, segmented by age, to forecast future rates by age. England (Nash, 2021), Scotland (Taylor, 2020), Japan (National Institute of Population and Social Security Research, Japan, 2018), and Korea (Statistics Korea, 2019) employ this model, while France (INSEE France, 2024) and Canada (Statistics Canada, 2020) base their projections on scenario settings. Generally, when there is a considerable interval between censuses, as is often the case with traditional census, it is logical to use the changes observed in the two most recent censuses for future projections.
The two-point (modified) exponential model is based on the headship rates
where,
The two-point exponential model projects future headship rates based on the trend observed in the two most recent census periods, assuming that the increase or decrease in these rates will continue. Howeover, if the change over these two periods is too big, the predicted headship rate for the distant future could diverge to an unrealistic or illogical value. The key parameter in this model,
To address this limitation, an extension to three points has been explored (Statistics Korea, 2019). In the three-point exponential model, which uses headship rates
However, this method can yield unstable estimates if the headship rate does not consistently increase or decrease between
This means that a direct extension of the 2-point exponential model to three points has limited applicability; it is only effective when the direction of change in the headship rate remains consistent throughout the entire period. Given the rapid and varied changes currently observed in headship rates, this model is not a suitable alternative model for multi-point data.
The register-based census data which is observed more frequently can capture the dynamic in household composition more accurately for household projection. This requires a more sophisticated model that incorporates headship rates at multiple time points. However, as previously discussed, the existing expanded three-point model encounters problems in consistency and stability issues due to changes in trends. Moreover, extending this form to a four-point or higher, accumulating N-points, necessitates alterations to model structure, which complicates its application. Considering that the two-point exponential model is widely used by many statistical agencies for its adequacy of required data and simplicity (Alias
To address this issue, we propose a new household projection model suitable for short-cycle registration censuses. This model extends the two-point exponential model with three key considerations: (1) The structure of model remains consistent, regardless of the length of the time series history. (2) It avoids unstable and illogical projections that may arise from trend changes. (3) It prevents divergence in long-term projections.
Focusing on these aspects, we suggest three modified N-point exponential models: The weighted N-point MEM, the regression-based N-point MEM, and the rolling weighted N-point MEM.
Suppose the headship rate
where,
and
The most recent headship rate
By incorporating past census information through the weighted average
Suppose the headship rate
where,
Here,
The most recent headship rate
By reflecting past census information through the fitted value of the regression model
Suppose the headship rate
where,
For the initial projection point, the weighted N-point MEM is applied. For projections beyond this point, the projection employs both the actual observed headship rates and the data from previously projected headship rates. This information is continuously accumulated and updated in a rolling manner for future projections. In other words, the projection at the initial point,
To project
Headship rates from the annual register-based census data from 2016 to 2020 are used for projection. The projection of household headship rates is segmented by age, while also considering the size and type of the household (Statistics Korea, 2019; Bell and Cooper, 1990; Wilson, 2013; Kajiwara
Single- and two-person households continue to increase, while households with more than three members are decreasing. In 2020, single-person households are the most common, but before 2019, ‘couple+children’ are the most common.
For future household projections, the age-specific household headship rate is utilized. It is therefore essential to analyze the age distribution of the household headship rate, considering both the number of household members and the type of household. Household headship rates, categorized by the number of household members and by household type and segmented in five-year intervals, are depicted in Figure 1 and Figure 2, respectively.
As shown in Figure 1, the distribution of household headship rates by age differs according to the number of household members. Notably, in single-person households, younger individuals are more likely to be household heads, while among middle-aged and older groups, the rate of heading households with two or more persons is higher. This pattern likely arises from younger people initially living independently in single-person households due to factors like education and employment, and then transitioning to larger households as they age. Although the age distribution for such households generally remains consistent over time, there is a noticeable decline in households with five or more members across almost all age groups.
Recently, with the rise in the single-person household headship rate, there has been a decrease in the headship rate of 2–3 person households, particularly among individuals in their 20s. The diminishing headship rate for individuals in their 30s in 3–4 person households is seemingly linked to the declining birth rate. Households with four members, and those with five or more members, exhibit similar age-specific trends. A notable time-series anomaly in the headship rate of older households is observed in 2016.
Figure 2 presents the household headship rates by age, categorized by household type. Although the distribution of headship rates by age varies across different household types, certain types demonstrate similar patterns. Figures 2(e) and 2(f), representing the distributions for single fathers with children and single mothers with children respectively, exhibit similar age distribution patterns. Moreover, a time-series anomaly in the headship rate among the elderly is observed across nearly all household types in 2016.
For projecting future households by age, it is necessary to have age-specific household headship rates at one-year intervals. For this purpose, Beer’s formula (Beers, 1945) is employed to interpolate headship rates from five-year age groups to annual rates (Park
To address the unrealistic forecasts frequently encountered in long-term predictions, a challenge when projecting future households using short-cycle registration censuses, we first examine the results of these long-term projections. Figures 3 and 4 display the projected household headship rates from 2021 to 2051 for single-person and married-couple households, respectively. Given that the projection patterns for all household sizes and types are similar, we present these two representative cases to conserve space.
Figure 3(a) and 4(a) presents the projection results using the two-point index model, indicating that long-term forecasts over 20 years are highly dependent on the variations at the observed two points. The projection results using the new multi-point modified exponential model (MEM) reveal that the reg-based N-point MEM does not reduce these fluctuations as much as anticipated. However, the results from the weighted N-point MEM demonstrate a considerable reduction in fluctuations compared to the two-point model. Additionally, the Rolling weighted N+point MEM provides the most stable results for long-term projections. The effectiveness of this Rolling weighted N+point MEM is particularly is more prominently displayed in Figure 4.
To evaluate the effectiveness of our new N-point MEM, we compare short-term prediction results using actual observed registration census data from 2021 to 2022. For this evaluation, we employ root mean squared error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and symmetric mean absolute percentage error (SMAPE). These are standard metrics used to measure the goodness of fit and predictive accuracy of a model (Makridakis, 1993).
MSE and MAE are the averages of the squared differences (
Table 3 shows the accuracy of short-term headship rate forecasts by the number of household members. The shaded cells in the table highlight the values with the minimum error in each case. Generally, the N-point modified exponential model (MEM) method outperforms the two-point exponential model. Considering dataset comprises only five annual data points from 2016 to 2020, the differences in RMSE and MSE are not substantial. However, it is apparent that the newly proposed N-point MEM methods provide more precise forecasts than the previous two-point exponential model.
The regression-based N-point MEM performs best for single-person households, which are currently increasing at an accelerating rate. Similarly, this method is most accurate for households with three or more individuals, which are declining, with the rate of decrease also accelerating. In contrast, for two-person households, which have been increasing but at a recently slowed rate, the weighted methods show the best performance.
Table 4 presents the accuracy of short-term predictions for the headship rate by household type. Similar to Table 3, the N-point MEM demonstrate superior performance compared to the two-point method. However, caution should be needed when interpreting the MAPE and SMAPE for ‘married-couple+ children’ and ‘grandparents+grandchildren’ household types, as their components are quite small (0.5%–0.7%). It is shown that weighted methods perform better in scenarios involving a change in growth rate, such as with ‘married couples’ (transitioning from the steepest to the softest growth) and ‘single mothers & children’ (also shifting from the steepest to the softest growth).
Since the introduction of the first register-based census in Korea in 2015, the census has shifted from a five-year general survey to a one-year administrative data-based census, which is now published. This change facilitates the reflection of rapid changes in Korea’s household structure, such as the increase in single-person households and the decrease in household size due to low birth rates and an aging population. However, population data based on the register-based census not only shortens the data cycle but also introduces variations and trends over time.
These fluctuations in the data necessitate a reevaluation of existing household projection models and the development of new, more appropriate models. The two-point exponential model, a traditional projection model based on censuses with longer cycles, is not suitable for register-based censuses with multiple data points over short cycles. This paper proposes N-point modified exponential methods (MEM) that can capture the dynamics of household headship rates at multiple points using register-based censuses, and considers three modifications: Weighted N-point MEM, regression-based N-point MEM, and rolling weighted N+point MEM.
Using these methods, future households in Korea have been projected by household size and type based on regiter-based census statistics from 2016 to 2020. Long-term projection results up to 2051 show that the previous two-point exponential model leads to large fluctuations and unstable forecasts for projections spanning 20 years. In contrast, the N-point methods show a reduction in these fluctuations. In particular, the rolling weighted N+point MEM provides the most stable long-term projection results.
Further, an examination of the accuracy of short-term projection results based on the updated 2021 and 2022 register-based census data shows that the newly proposed N-point methods generally outperform the previous two-point model. If the rate of change in the headship rate is simply increasing or decreasing, the regression-based N-point MEM performs better; otherwise, the weighted N-point MEM or rolling weighted N+point MEM methods are more effective.
Despite limitations in performance evaluation due to the short period of register-based census data, we can sufficiently show that the N-point MEMs are more suitable than previous two-point model for household projections. The N-point MEMs effectively incorporate multi-point register-based census information in a timely manner without the need for additional data or adding to the model’s complexity, which enhances its practical utility. As registration census data continues to accumulate over time, we expect they to enable the creation of more precise and scientifically robust national statistics, including population and household estimates.
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT)(NRF-2022R1F1A1065520 to Jeon, and NRF-2021R1F1A1059513 to Kwon). This work was supported by Hankuk University of Foreign Studies Research Fund.
Headship rate by number of household members in Korea
Number of household members | 2016 | 2017 | 2018 | 2019 | 2020 |
---|---|---|---|---|---|
1 | 27.66 | 28.33 | 29.04 | 29.90 | 31.24 |
2 | 26.15 | 26.55 | 27.08 | 27.65 | 27.96 |
3 | 21.44 | 21.31 | 21.11 | 20.84 | 20.29 |
4 | 18.48 | 17.88 | 17.22 | 16.48 | 15.83 |
5+ | 6.26 | 5.94 | 5.56 | 5.13 | 4.68 |
Headship rate by household type in Korea
Generations | Household type | 2016 | 2017 | 2018 | 2019 | 2020 |
---|---|---|---|---|---|---|
Single-person 1 Generation | Single-person | 27.66 | 28.33 | 29.04 | 29.90 | 31.24 |
Married-couple | 15.51 | 15.74 | 16.13 | 16.52 | 16.76 | |
1 Generation Others | 1.75 | 1.74 | 1.76 | 1.78 | 1.79 | |
2 Generations | Married-couple+ Children | 32.03 | 31.53 | 30.87 | 30.08 | 29.32 |
Single father + Children | 2.72 | 2.65 | 2.60 | 2.55 | 2.45 | |
Single mother + Children | 7.76 | 7.59 | 7.52 | 7.42 | 7.33 | |
Married-couple+ Parents | 0.75 | 0.72 | 0.69 | 0.67 | 0.62 | |
Grandparents+ unmarried grand children | 0.57 | 0.57 | 0.57 | 0.56 | 0.56 | |
2 Generations Others | 4.66 | 4.64 | 4.54 | 4.40 | 4.11 | |
3 Generations | Married-couple + Unmarried children +Parents | 2.98 | 2.81 | 2.60 | 2.38 | 2.10 |
3 or More Generations Others | 2.31 | 2.16 | 2.03 | 1.90 | 1.72 | |
Others | Non-relatives | 1.30 | 1.51 | 1.66 | 1.84 | 1.98 |
Performance of projection models by number of household members (2021–2022)
Number of household | Error | 2-point | Weighted N-point | Reg based N-point | Rolling weighted N+point |
---|---|---|---|---|---|
1 | RMSE | 0.019 | 0.020 | 0.018 | 0.021 |
MAE | 0.021 | 0.024 | 0.020 | 0.024 | |
MAPE | 4.974 | 6.082 | 4.898 | 6.224 | |
SMAPE | 1.267 | 1.553 | 1.247 | 1.590 | |
2 | RMSE | 0.011 | 0.010 | 0.011 | 0.010 |
MAE | 0.011 | 0.009 | 0.011 | 0.009 | |
MAPE | 6.036 | 5.324 | 6.051 | 5.283 | |
SMAPE | 1.569 | 1.372 | 1.572 | 1.358 | |
3 | RMSE | 0.007 | 0.007 | 0.006 | 0.007 |
MAE | 0.008 | 0.008 | 0.007 | 0.008 | |
MAPE | 8.963 | 8.296 | 8.485 | 8.447 | |
SMAPE | 2.281 | 2.088 | 2.158 | 2.122 | |
4 | RMSE | 0.007 | 0.009 | 0.007 | 0.009 |
MAE | 0.007 | 0.009 | 0.007 | 0.009 | |
MAPE | 16.067 | 17.834 | 15.938 | 18.183 | |
SMAPE | 3.985 | 4.412 | 3.926 | 4.489 | |
5+ | RMSE | 0.003 | 0.004 | 0.003 | 0.004 |
MAE | 0.003 | 0.004 | 0.003 | 0.005 | |
MAPE | 13.722 | 21.837 | 13.476 | 22.691 | |
SMAPE | 3.485 | 5.456 | 3.433 | 5.633 |
*Shaded the minimum value.
Performance of projection models by household type (2021–2022)
Generation | Household type | Error | 2-point | Weighted N-point | Reg based N-point | Rolling weighted N+point |
---|---|---|---|---|---|---|
One-generation | Married-couple | RMSE | 0.008 | 0.008 | 0.008 | 0.008 |
MAE | 0.007 | 0.007 | 0.007 | 0.007 | ||
MAPE | 24.635 | 26.847 | 25.414 | 26.958 | ||
SMAPE | 4.215 | 4.460 | 4.350 | 4.455 | ||
Others | RMSE | 0.004 | 0.003 | 0.004 | 0.003 | |
MAE | 0.002 | 0.002 | 0.002 | 0.002 | ||
MAPE | 14.676 | 12.055 | 14.162 | 12.104 | ||
SMAPE | 3.794 | 3.046 | 3.655 | 3.052 | ||
Two-generations | Married-couple & Children | RMSE | 0.009 | 0.010 | 0.008 | 0.010 |
MAE | 0.008 | 0.009 | 0.008 | 0.009 | ||
MAPE | 9.355 | 8.262 | 8.966 | 8.435 | ||
SMAPE | 2.363 | 2.057 | 2.263 | 2.097 | ||
Single father & Children | RMSE | 0.001 | 0.001 | 0.001 | 0.001 | |
MAE | 0.001 | 0.001 | 0.001 | 0.001 | ||
MAPE | 13.335 | 10.478 | 11.903 | 10.460 | ||
SMAPE | 3.773 | 2.807 | 3.318 | 2.788 | ||
Single mother & Children | RMSE | 0.003 | 0.003 | 0.003 | 0.003 | |
MAE | 0.003 | 0.003 | 0.003 | 0.003 | ||
MAPE | 6.686 | 6.173 | 6.312 | 6.420 | ||
SMAPE | 1.688 | 1.539 | 1.587 | 1.600 | ||
Married-couple & Parents | RMSE | 0.001 | 0.002 | 0.001 | 0.002 | |
MAE | 0.001 | 0.001 | 0.001 | 0.001 | ||
MAPE | 25.936 | 30.121 | 26.286 | 30.718 | ||
SMAPE | 4.786 | 5.417 | 4.811 | 5.523 | ||
Grandparents & grand children | RMSE | 0.002 | 0.002 | 0.002 | 0.002 | |
MAE | 0.001 | 0.001 | 0.001 | 0.001 | ||
MAPE | 12.846 | 15.012 | 12.682 | 15.144 | ||
SMAPE | 3.143 | 3.711 | 3.093 | 3.746 | ||
Others | RMSE | 0.005 | 0.005 | 0.005 | 0.005 | |
MAE | 0.005 | 0.005 | 0.005 | 0.005 | ||
MAPE | 11.883 | 13.952 | 11.730 | 14.306 | ||
SMAPE | 3.352 | 3.776 | 3.287 | 3.849 | ||
Three or more generations | Married-couple & Unmarried children & Parents | RMSE | 0.001 | 0.002 | 0.001 | 0.002 |
MAE | 0.001 | 0.002 | 0.001 | 0.003 | ||
MAPE | 21.241 | 28.935 | 21.117 | 29.869 | ||
SMAPE | 7.801 | 9.473 | 7.760 | 9.671 | ||
Others | RMSE | 0.002 | 0.002 | 0.001 | 0.002 | |
MAE | 0.002 | 0.002 | 0.001 | 0.002 | ||
MAPE | 55.001 | 87.283 | 58.320 | 88.878 | ||
SMAPE | 5.634 | 7.428 | 5.619 | 7.613 | ||
Non-relatives | RMSE | 0.004 | 0.003 | 0.004 | 0.003 | |
MAE | 0.004 | 0.003 | 0.004 | 0.003 | ||
MAPE | 13.829 | 11.600 | 14.263 | 11.830 | ||
SMAPE | 3.334 | 2.907 | 3.422 | 2.972 |
*Shaded the minimum value.