
In identifying the potential risk of exposure to ionizing radiation, the excess relative risk (ERR) is a common measure to quantify the relationship between radiation exposure and the risk of cancer incidence or mortality. The crucial research on radiation-associated cancer risk assessment using the ERR have been conducted using the life span study (LSS) cohort data of Japanese atomic bomb survivors in Hiroshima and Nagasaki (Grant
Understanding the dose–response relationship is a main goal for research on radiation-associated health effects. The dose–response function represents this relationship; thus, estimating this function has been a primary concern. In general, parsimonious models are preferred as simple parametric forms, such as linear non-threshold, linear-quadratic, and quadratic functions in radiation epidemiological studies. However, the estimates from these models are sometimes unable to reflect the uncertainty at low doses. In addition, the risk at higher doses is more influential than that at lower doses (Furukawa
One alternative for understanding the dose–response relationship is to use piecewise functions which divides the domain into several equally spaced intervals and applies a different subfunction to each interval. Furukawa
Splines defined by a sum of piecewise polynomials are more flexible methods. When using splines, the number and location of knots should be specified in advance. The important tuning parameters for splines are the number and location of knots. Kauermann and Opsomer (2011) proposed data-driven selection of the number of spline basis functions in the penalized spline regression based on a likelihood criterion. Dung and Tjahjowidodo (2017) investigated the identification of the number and location of knots in non-uniform space using B-spline functions. In addition, Dimatteo
We assume that the dose distribution follows an infinite mixture of normal distributions; thus, we can select knots as the overlapping points of two normal distributions. The DPMM is an infinite mixture model with a countably infinite number of clusters inferred from data (Teh
The remainder of this paper is organized as follows. Section 2 describes the ERR model and properties of the radiation dose distribution. In Section 3, we explains the knot selection method based on the DPMM, and then an estimation method of the spline function in a Bayesian framework. Section 4 presents an example using real data to illustrate the proposed knot selection method based on the Bayesian approach, and Section 5 provides concluding remarks.
The ERR describes the proportional risk increase above the baseline rate
where RR is the relative risk,
The ERR model consists of the dose-response function
where
To determine a more stable dose–response relationship than the above parametric dose-response functions, we consider a spline function as smooth piecewise polynomials. This allows us to apply a polynomial function to each subinterval, and connect the pieces to construct a smooth function. The piecewise quadratic spline function is defined as,
However, it is difficult to define the correct number and location of knots,
The goal of this subsection is to divide a spline domain by clustering dose observations. The DPMM and infinite mixture of Gaussian models are widely used to estimate a density function (Escobar and West, 1995; Müller
Let the density function be
By properties of the beta distribution, as
Let
Note that the finite number
To estimate the parameters, we must obtain values from the posterior distribution of (
In each iteration,
Sample
(a) Sample
(b) For each
More specifically, we can rewrite this as an explicit formula. Let
1) Draw
2) Draw
3) For each
Draw
Draw
Sample
where
Sample
where
After choosing the knots, we estimate the coefficients
where
where
The LSS cohort pertains to atomic bomb survivors in Hiroshima and Nagasaki, Japan, and provides the risk estimates of cancers related to radiation exposure. The LSS cohort is a main source of radiation-associated risk assessment for humans. The report in Grant
Table 1 presents the estimated number of cluster components. The mode was used as the appropriate number of clusters for males and females: 7 for males and 6 for females. The estimates of the means and variances of each cluster are presented in Table 2. As illustrated in Figure 2, we selected each red point as a knot that had the maximum likelihood in the overlapping region of two normal distributions, and the results are presented in Table 3. It should be noted that the selected knots in Figure 2 appear to reflect the highly skewed dose distribution in Figure 1. Thus, as expected, the knot selection using the DPMM can be an effective method to draw data-driven knots.
Since the concentration parameter
The chains converged because the Gelman-Rubin statistics were close to 1 in Table 4. There was also the estimated
To demonstrate the importance of the knot selection procedure, we also compared two spline curves using DPMM-based knots (black) and equally spaced knots (red) in Figures 5 and 6, respectively. In this case, the number of knots remained unchanged, and only the location of the knots was altered. Over the entire dose range, there appeared to be little difference between the two curves. At low doses, however, the red curve estimated the ERR to be smaller than the black curve. The clearest distinction was that the interval estimates of the DPMM-based spline contained zero at doses of
In this paper, we proposed spline curve fitting of the dose–response function for the excess relative risk model commonly used in radiation epidemiology. Since the choice of the number and location of knots is crucial in splines, we proposed the Dirichlet process mixture model in selecting data-driven knots, treating knot selection as a clustering problem. When the dose distribution is highly skewed, it is particularly effective due to its flexibility in the number of clusters. The chosen knots appeared to successfully reflect the dose distribution. Then, based on these knots, we estimated the Excess Relative Risk model in the Bayesian framework using the Lifetime Span Study cohort, and compared the spline curve to other parametric dose-response functions. For both males and females, the spline curves estimated the Excess Relative Risk model to be smaller than other curves; however, the Excess Relative Risk model became similar to that of the other curves as the dose increased. The estimation of the dose-response curves was greatly affected by observations at higher doses, but the use of the spline can alleviate this problem. Although we assumed a noninformative prior for the coefficients of the spline, a Gaussian process prior with a covariance matrix depending on the knots can be used as an alternative. The limitation of this research only provided that a Bayesian knot selection is conduced separately from fitting the Bayesian excess relative model.
Number (
# of clusters | ||||||
---|---|---|---|---|---|---|
6 | 7 | 8 | 9 | 10 | ||
Males | 3464 | 1290 | 225 | 21 | ||
% | 69.28 | 25.80 | 4.50 | 0.42 | ||
Females | 3437 | 1310 | 223 | 227 | 3 | |
% | 68.74 | 26.20 | 4.46 | 0.54 | 0.06 |
Posterior means and variances of each cluster for male and females
Cluster | ||||||||
---|---|---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | 7 | ||
Males | Mean | 0.006 | 0.072 | 0.193 | 0.430 | 0.784 | 1.314 | 2.394 |
Variance | 0.0002 | 0.0017 | 0.0066 | 0.0261 | 0.0654 | 0.1402 | 0.1135 | |
Females | Mean | 0.008 | 0.096 | 0.238 | 0.593 | 1.194 | 2.314 | |
Variance | 0.0002 | 0.0017 | 0.0261 | 0.0654 | 0.0006 | 0.1402 |
Knot selection for males and females
Knot | ||||||
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | |
Males | 0.03 | 0.13 | 0.31 | 0.60 | 1.04 | 1.85 |
Females | 0.05 | 0.16 | 0.41 | 0.89 | 1.75 |
Parameter estimates and Gelman-Rubin statistics for males and females
Posterior mean | |||||||||
---|---|---|---|---|---|---|---|---|---|
Sex | GR | ||||||||
Males | 0.053 | 0.028 | 0.030 | 0.025 | 0.021 | 0.016 | 0.023 | 0.107 | 1.05 |
Females | 0.362 | 0.027 | 0.028 | 0.023 | 0.019 | 0.022 | 0.071 | 1.02 |