The inclusion of covariates in the model often affects not only the estimates of meaningful variables of interest but also its statistical significance. Such gap between statistical and subject-matter significance is a critical issue in huge sample studies. A popular huge sample study, the sample cohort data from Korean National Health Insurance Service, showed such gap of significance in the inference for the effect of obesity on cause of mortality, requiring careful consideration. In this regard, this paper proposes a sample size calibration method based on a Monte Carlo
The Korean National Health Insurance Service released a sample cohort database Lee
The natural questions are as follows. Why do these contradictory results arise from huge samples? How do we identify which model reveals the correct effect of WC or BMI? In this paper, as an answer for the second question, we propose a new testing approach to have a consistent result that is not sensitive to the inclusion or omission of some covariates in the model.
One may answer the first question with the omitted variable bias (lurking variable bias) or multicolinearity problem. However, we may not distinguish whether bias occurs due to omitting a significant covariate or adding an unnecessary variable. Moreover, we cannot filter these biases with the significance test especially in a huge sample, because statistical inference based on
The deflated
Some suggestions for the huge sample problem have been to make the alpha level much smaller than the traditional 0.01 or 0.05 (Leamer, 1978; DeGroot and Schervish, 2002; Greene, 2003) and to replace the single value in the null hypothesis by an interval (DeGroot and Schervish, 2002; Hubbard and Armstrong, 2006). A standardized
As a solution for the
The sample size calibration test procedure we propose for subject-matter significance is now applied to five sets of real data. The first data set is the Korea sample cohort data from which we estimate the effects of WC and BMI on mortality to show that the results vary in statistical significance depending on inclusion or omission of some covariates in the model. Three sample size calibrated
The remaining four datasets are actually unknown except for the estimation results in literature for the effects of BMI on all causes of mortality. Their sample sizes range from 0.15 to 1.46 million and study subjects are quite different: White adults (Berrington de Gonzalez
The rest of the paper is organized as follows. In Section 2, we investigate the reason why contradictory results in a huge sample arise between the regression models that include or exclude relevant covariates and illustrate
Consider the following two regression models to investigate the
where all errors are assumed to be iid. When there are more explanatory variables other than
Denote
Lemma 1 shows the well-known fact that the OLS coefficient of
We generate
Figure 1 shows that
All of the discussions related to the deflated
The deflated
The likelihood function for the proportional hazard model and the partial likelihood function for the Cox’s proportional hazard model are the same as the Poisson regression model by letting
We propose the following Monte Carlo approach to avoid the deflated
The same Monte Carlo approach for the parameter estimate of interest and its
First of all, the Monte Carlo method for
where
We call such
This states that the expected
CF(
For example, one may take three sample sizes,
When a variable of interest is categorical, however, the sample size should be chosen based on cell size with the smallest cell probability. For example, in the Korean National Health Insurance Sample, the death rate is 1.37% (i.e., 6,378 deaths among 466,345 people during 5 years) and the risk of death associated with the three groups of BMI indicated are one of our interests. In this case, we take
It is similar to Good’s
Two regression models,
Table 1 shows that Good’s
Obesity and overweight are two of the most important risk factors related to health problems and mortality (Renehan
Obesity (a BMI of 30 or more) and overweight (a BMI of 25–29.9) are associated with an increased risk of death among several populations (Katzmarzyk
We use three sample size calibrated
National health insurance in Korea is mandatory for all citizens and contains all data related to the history of individual medical treatments including health screening data. Since WC has been included in health screening since 2008 and the people who had cancer or CVD in 2006 or 2007 are excluded from the data, our data analysis for the effects of WC and BMI on mortality includes 466,345 Korean people who received a health screening between 2008 and 2013, during which 6,378 (1.37%) deaths were observed.
WC and BMI are divided into three groups: low (WC < 73.9cm, BMI < 21.5), middle (73.9 ≤ WC < 92.2cm, 21.5 ≤ BMI < 28), and high (WC ≥ 92.2, BMI ≥ 28). Five Cox’s proportional hazard regression models are considered to investigate the effects of WC and BMI on the risk of death. Model 1 includes only WC without age, and Model 2 includes WC with age. These two models are adjusted by gender with males as the reference. Model 3 and 4 include only BMI without age and with age, respectively where both models are adjusted by gender. Model 5 includes both WC and BMI with gender and age; however, WC and BMI are highly correlated with correlation coefficient 0.8 for both genders. Here, the effects of WC and BMI are estimated based on the middle groups as references.
Table 2 presents five Cox regression models with
However, based on
We note that an obesity paradox exists when overweight (a BMI of 25–27.9) or moderately obese (29–29.9) is associated with a lower risk of death than normal (a BMI of 22.5–24.9). The study investigated BMI (Berrington de Gonzalez
Using 527,265 Americans (313,047 men and 214,218 women) who were 51 to 71 years old in 1995–1996 and among whom 61,317 (42,173 men and 19,144 women) died during a maximum follow-up of 10 years through 2005 (Adams
A total of 19 cohorts from the Asia Cohort Consortium BMI project in 2008 was used to investigate the association of BMI and mortality among 1,141,609 Asian people yielding 120,700 deaths (Zheng
The Korean National Health Insurance Service released a national sample cohort (from 2002–2010) consisting of 1,025,340 Koreans among whom 153,484 received a health screening in 2002 or 2003. During 7.91±0.59 mean years follow-up until 2010, 3,937 deaths occurred among the 153,484 Koreans. The study based on this data (Kim
For the sample size calibrated test, we use representative symbols as given by a + sign if |
Note that the sample sizes to calculate
In the populations of 1.46 million white adults and 0.53 million 50–71 years old Americans, the sample size calibrated test reveals a wide U-shape with a BMI of 22.5–29.9 or 23.5–34.9, associated with the lowest risk of death for men, whereas, for women, an apparent wide U-shape with a BMI of 20–29.9 or 21–29.9 associated with the lowest risk of death and high hazard ratios are at underweight (a BMI of 15–19.9 or less than 20.9) and obese (35–39.9) or morbidly obese (40–49.9).
However, in the populations of Asian origin (Zheng
There is no subject-matter significance of the obesity paradox regardless of race, sex, and age groups; however, the existence of a statistically significant obesity paradox and the range of BMI associated with the statistically lowest risk of death depend on study populations.
The deflated
The sample size calibration method can be easily extended to an
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (NRF-2018R1C1B5043739). This work was supported by Hankuk University of Foreign Studies Research Fund.
Comparison of statistical and subject-matter significances
Parameter | Sample size | |||||||
---|---|---|---|---|---|---|---|---|
1,000 | 3,000 | 5,000 | 10,000 | 40,000 | 50,000 | 100,000 | ||
Estimate | 0.050 | 0.052 | 0.050 | 0.051 | 0.051 | 0.051 | 0.050 | |
1.510 | 2.688 | 3.395 | 4.821 | 9.710 | 10.781 | 15.218 | ||
CF( |
0.048 | 0.049 | 0.048 | 0.048 | 0.048 | 0.048 | 0.048 | |
0.338 | 0.347 | 0.339 | 0.341 | 0.341 | 0.341 | 0.340 | ||
0.478 | 0.491 | 0.480 | 0.482 | 0.482 | 0.482 | 0.481 | ||
0.827 | 0.850 | 0.831 | 0.835 | 0.835 | 0.835 | 0.834 | ||
Good_{50} | 0.500 | 0.426 | 0.172 | 0.009 | <0.001 | <0.001 | <0.001 | |
Good_{100} | 0.500 | 0.301 | 0.121 | 0.007 | <0.001 | <0.001 | <0.001 | |
Good_{300} | 0.427 | 0.174 | 0.070 | 0.004 | <0.001 | <0.001 | <0.001 | |
Estimate | −0.132 | −0.132 | −0.133 | −0.133 | −0.133 | −0.133 | −0.133 | |
−3.606 | −6.272 | −8.154 | −11.529 | −22.946 | −25.756 | −36.446 | ||
CF( |
−0.114 | −0.115 | −0.115 | −0.115 | −0.115 | −0.115 | −0.115 | |
−0.806 | −0.810 | −0.815 | −0.815 | −0.813 | −0.814 | −0.815 | ||
−1.140 | −1.145 | −1.153 | −1.153 | −1.150 | −1.152 | −1.153 | ||
−1.975 | −1.983 | −1.997 | −1.997 | −1.992 | −1.995 | −1.996 | ||
Good_{50} | 0.051 | <0.001 | <0.001 | <0.001 | <0.001 | <0.001 | <0.001 | |
Good_{100} | 0.036 | <0.001 | <0.001 | <0.001 | <0.001 | <0.001 | <0.001 | |
Good_{300} | 0.021 | <0.001 | <0.001 | <0.001 | <0.001 | <0.001 | <0.001 | |
Estimate | 0.364 | 0.367 | 0.367 | 0.367 | 0.367 | 0.367 | 0.367 | |
9.977 | 17.398 | 22.478 | 31.810 | 63.545 | 71.102 | 100.537 | ||
CF( |
0.315 | 0.318 | 0.318 | 0.318 | 0.318 | 0.318 | 0.318 | |
2.231 | 2.246 | 2.248 | 2.249 | 2.246 | 2.248 | 2.248 | ||
3.155 | 3.176 | 3.179 | 3.181 | 3.177 | 3.180 | 3.179 | ||
5.465 | 5.502 | 5.506 | 5.510 | 5.503 | 5.508 | 5.507 | ||
Good_{50} | <0.001 | <0.001 | <0.001 | <0.001 | <0.001 | <0.001 | <0.001 | |
Good_{100} | <0.001 | <0.001 | <0.001 | <0.001 | <0.001 | <0.001 | <0.001 | |
Good_{300} | <0.001 | <0.001 | <0.001 | <0.001 | <0.001 | <0.001 | <0.001 |
CF = calibration factor.
Five Cox’s proportional hazard models for the effects of WC and BMI on all causes of mortality
Models | Variables | Estimates | CF( |
|||||
---|---|---|---|---|---|---|---|---|
Model 1 | WC_{low} | −0.126 | −3.64 | 0.0003 | −0.0053 | −0.550 | −0.780 | −1.360 |
WC_{high} | 0.110 | 2.74 | 0.006 | 0.0040 | 0.420 | 0.590 | 1.030 | |
Sex_{female} | −0.495 | −17.59 | <2e-16 | −0.0258 | −2.700 | −3.820 | −6.610 | |
Model 2 | WC_{low} | 0.414 | 12.53 | <2e-16 | 0.0183 | 1.910 | 2.710 | 4.690 |
WC_{high} | −0.078 | −1.93 | 0.053 | −0.0028 | −0.290 | −0.410 | −0.720 | |
Sex_{female} | −0.862 | −32.39 | <2e-16 | −0.0474 | −4.960 | −7.010 | −12.100 | |
Age | 0.109 | 106.13 | <2e-16 | 0.1554 | 16.260 | 23.000 | 39.800 | |
Model 3 | BMI_{low} | 0.698 | 26.00 | <2e-16 | 0.0381 | 3.990 | 5.640 | 9.770 |
BMI_{high} | −0.204 | −3.98 | 0.000068 | −0.0058 | −0.610 | −0.860 | −1.490 | |
Sex_{female} | −0.679 | −25.62 | <2e-16 | −0.0375 | −3.920 | −5.550 | −9.610 | |
Model 4 | BMI_{low} | 0.681 | 25.60 | <2e-16 | 0.0375 | 3.920 | 5.550 | 9.610 |
BMI_{high} | 0.010 | 0.20 | 0.84 | 0.0003 | 0.031 | 0.044 | 0.077 | |
Sex_{female} | −0.799 | −30.60 | <2e-16 | −0.0448 | −4.190 | −6.620 | −11.500 | |
Age | 0.104 | 102.60 | <2e-16 | 0.1502 | 15.700 | 22.200 | 38.500 | |
Model 5 | BMI_{low} | 0.691 | 22.53 | <2e-16 | 0.0330 | 3.450 | 4.880 | 8.460 |
BMI_{high} | −0.048 | −0.85 | 0.395 | −0.0012 | −0.126 | −0.180 | −0.310 | |
WC_{low} | 0.007 | 0.19 | 0.85 | 0.0003 | 0.031 | 0.044 | 0.077 | |
WC_{high} | 0.113 | 2.47 | 0.013 | 0.0036 | 0.377 | 0.533 | 0.924 | |
Sex_{female} | −0.792 | −29.39 | <2e-16 | −0.0430 | −4.500 | −6.360 | −11.000 | |
Age | 0.104 | 101.57 | <2e-16 | 0.1487 | 11.560 | 22.000 | 38.100 |
CF = calibration factor; WC = waist circumference; BMI = body mass index.
Tests for the effect of BMI on all cause mortality in four populations
Population | Sex | Hazard ratios(>1: more risky than reference group, <1: less risky than reference group | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
White-adults | 15–18.4 | 18.5–19.9 | 20–22.4 | 22.5–24.9 | 25–27.4 | 27.5–29.9 | 30–34.9 | 35–39.9 | 40–49.9 | ||
F | 2.02* | 1.34* | 1.06* | reference | 1.03* | 1.11* | 1.25* | 1.59* | 1.99* | ||
M | 1.98* | 1.6* | 1.18* | reference | 0.97* | 1.03* | 1.16* | 1.44* | 1.93* | ||
50–71 years old Americans | < 18.5 | 18.5–20.9 | 21–23.4 | 23.5–24.9 | 25–26.4 | 26.5–27.9 | 28–29.9 | 30–34.9 | 35–39.9 | 40– | |
F | 2.03* | 1.3* | 1.07* | reference | 1 | 1.06 | 1.07* | 1.18* | 1.49* | 1.94* | |
M | 1.97* | 1.54* | 1.14* | reference | 0.95* | 0.95* | 1 | 1.1* | 1.35* | 1.83* | |
East Asian Indian&Bangla |
≤ 15 | 15.1–17.5 | 17.6–20.0 | 20.1–22.5 | 22.6–25.0 | 25.1–27.5 | 27.6–30.0 | 30.1–32.5 | 32.6–35.0 | 35.1–50 | |
2.76* | 1.84* | 1.35* | 1.09* | reference | 0.98 | 1.07* | 1.2* | 1.5* | 1.49* | ||
2.14* | 1.59* | 1.26* | 1.09 | reference | 0.98 | 0.94 | 1.03 | 0.86 | 1.27 | ||
Koreans | < 18.5 | 18.5–19.9 | 20–21.4 | 21.5–22.9 | 23–24.9 | 25–26.4 | 26.5–27.9 | 28–29.9 | 30–32.4 | 32.5– | |
2.31* | 1.73* | 1.25* | 1.23* | reference | 0.86* | 0.88 | 0.95 | 1.12 | 1.25 | ||
Population | Sex | Sample size calibration test significance of BMI hazard ratio | |||||||||
White-adults | 15–18.4 | 18.5–19.9 | 20–22.4 | 22.5–24.9 | 25–27.4 | 27.5–29.9 | 30–34.9 | 35–39.9 | 40–49.9 | ||
F | + + + | − + + | − − − | reference | − − − | − − − | − − + | + + + | + + + | ||
M | − − + | − − + | − − + | reference | − − − | − − − | − − + | − − + | − + + | ||
50–71 years old Americans | < 18.5 | 18.5–20.9 | 21–23.4 | 23.5–24.9 | 25–26.4 | 26.5–27.9 | 28–29.9 | 30–34.9 | 35–39.9 | 40– | |
F | + + + | − + + | − − − | reference | − − − | − − − | − − − | − − + | − + + | + + + | |
M | − − + | − + + | − − + | reference | − − − | − − − | − − − | − − − | − − + | − + + | |
East Asian Indian&Bangla |
≤ 15 | 15.1–17.5 | 17.6–20.0 | 20.1–22.5 | 22.6–25.0 | 25.1–27.5 | 27.6–30.0 | 30.1–32.5 | 32.6–35.0 | 35.1–50 | |
− − − | − − + | − − − | − − − | reference | − − − | − − − | − − − | − − − | − − − | ||
− + + | − − + | − − − | − − − | reference | − − − | − − − | − − − | − − − | − − − | ||
Koreans | < 18.5 | 18.5–19.9 | 20–21.4 | 21.5–22.9 | 23–24.9 | 25–26.4 | 26.5–27.9 | 28–29.9 | 30–32.4 | 32.5– | |
+ + + | + + + | − − + | − + + | reference | − − + | − − − | − − − | − − − | − − − |
For sample size calibration test, strong significance is represented by + + +, weak significance by − + +, weak insignificance by − − +, and strong insignificance by − − −.