TEXT SIZE

search for



CrossRef (0)
Regression discontinuity for survival data
Communications for Statistical Applications and Methods 2024;31:155-178
Published online January 31, 2024
© 2024 Korean Statistical Society.

Youngjoo Cho1,a

aDepartment of Applied Statistics, Konkuk University, Korea
Correspondence to: 1 Department of Applied Statistics, Konkuk University, 120 Neungdong-ro, Gwangjin-gu, Seoul 07341, Korea. E-mail: yvc5154@konkuk.ac.kr
This work was supported by Konkuk University (2021-A019-0164).
Received July 4, 2023; Revised November 8, 2023; Accepted December 8, 2023.
 Abstract
Regression discontinuity (RD) design is one of the most widely used methods in causal inference for estimation of treatment effect when the treatment is created by a cutpoint from the covariate of interest. There has been little attention to RD design, although it provides a very useful tool for analysis of treatment effect for censored data. In this paper, we define the causal effect for survival function in RD design when the treatment is assigned deterministically by the covariate of interest. We propose estimators of this causal effect for survival data by using transformation, which leads unbiased estimator of the survival function with local linear regression. Simulation studies show the validity of our approach. We also illustrate our proposed method using the prostate, lung, colorectal and ovarian (PLCO) dataset.
Keywords : survival analysis, causal inference, local linear regression, regression discontinuity, doubly robust
1. Introduction

Regression discontinuity (RD) design is one of the most widely used methods in causal inference for the estimation of treatment effect by creating discontinuity with a covariate of interest. In this design, the treatment assignment is decided by covariate of interest deterministically or probabilistically, a so-called running variable with a pre-determined threshold. When the treatment assignment is a deterministic function of running variable, it is called sharp RD design. If the treatment assignment is a function of running variable with randomness, it is called fuzzy RD design. Due to its characteristic, in sharp RD, the treatment assignment is random in the threshold of running variable. In other words, in this sharp RD, although our study is an observational study, we have the same environment as randomized assignment of treatment in the threshold of running variable. This property enables us to avoid unmeasured confounders assumption, which is a fundamental one in causal inference but may be unreasonable in practice.

The RD design was first proposed by Thistlethwaite and Campbell (1960), and it has received much attention in social science and economics. For example, Ludwig and Miller (2007) study the effect of funding in education. Lee (2008) uses RD design to study the effect of party affiliation probability of Democrats winning in the next election. For theoretical work, Hahn et al. (1999) and Hahn et al. (2001) prove nonparametric identification of treatment effect and asymptotic results of the estimator of the treatment effect on RD design. To estimate the treatment effect, they use local linear regression. This local linear or polynomial regression is a widely used method on the RD design because they effectively handle discontinuity and are theoretically well supported.

Since choosing bandwidth in the local regression is a crucial issue, many researchers have proposed several methods. Ludwig and Miller (2007) propose method for choosing bandwidth from the local linear regression by cross-validation. Imbens and Kalyanaraman (2012) propose choosing bandwidth by optimizing mean squared error. Recently, Calonico et al. (2014) propose method which corrects bias from local regression and provides robust confidence interval. Calonico et al. (2020) propose bandwidth choice to achieve smaller coverage rate than one in optimizing mean squared error.

However, the aforementioned works are focused on uncensored data. The RD design in the censored data can provide a useful answer to the clinician’s research question. As discussed by Shoag et al. (2015) and Cho et al. (2021), in prostate cancer research, the usefulness of patient screening based on prostate-specific antigen (PSA) for survival or prostate-cancer specific incidence is still an open question. In this setup, the outcome of interest is time to death or prostate cancer incidence, and it is subject to censoring, i.e., patients may not experience death or prostate cancer on their observed time. In practice, a patient is considered high risk if his PSA level is greater than or equal to 4.0mg/nl. People in the screening group with a PSA of 4.0 ng/ml at any time were suggested to receive further checkup and biopsy. This is a clearly sharp RD setting: Shoag et al. (2015) use RD design with a binary outcome to answer this question using prostate, lung, colorectal, and ovarian (PLCO) data with prostate cancer. The objective of this PLCO trial is to investigate whether screenings for prostate, lung, colorectal, ovarian cancers are effective to reduce mortality. It is a multi-center, two-armed, and randomized trial (Prorok et al., 2000). Men and women people with age between 55 to 74 were enrolled at 10 centers in the United States from 1993 to 2001. Patients with prostate cancer in the screening group received PSA testing for 6 years and digital rectal examination for 4 years annually (Andriole et al., 2009).

There is little research on using RD design in censored data. Recently, Bor et al. (2014) and Moscoe et al. (2015) use RD design to answer a research question on the effect of early versus late treatment initiation for the survival of HIV patients. Cho et al. (2021) extend Steingrimsson et al. (2019) to RD design and employ local linear regression by Fan and Gijbels (1996) to estimate treatment effects under RD design. They focus on survival time and analyze the aforementioned PLCO data. However, there is no research to discuss modeling survival probability, an essential quantity in the censored data, in the RD design. Moreover, our proposed method allows different treatment effects at each time point, opposite to Cho et al. (2021)’s method.

In this paper, we discuss modeling survival probability in the RD design. Our approach employs the approach of Steingrimsson et al. (2019) and Cho et al. (2021). Our method enjoys doubly robustness, as Steingrimsson et al. (2019) and Cho et al. (2021). Moreover, we also investigate the application of the pseudo-value approach (Anderson et al., 2003) to RD design, which provides an unbiased estimator of the survival function.

The paper is organized as follows. In the next section, we discuss the estimation of treatment effect with regard to survival probability in the RD design for uncensored and censored data. We propose asymptotic theory and statistical inference for the proposed method, and the extension with the pseudo-value approach from Anderson et al. (2003) in Section 3. We demonstrate simulation studies with various settings in Section 4. Real data analysis is shown in Section 5. Conclusion and future research are discussed in Section 6.

2. Regression discontinuity with regards to survival probability

From framework of Rubin (1974), we define the potential outcome for survival data. Let T(1) and T(0) be the potential time to the event of treatment group and control group, respectively. Let W be a running variable. In the RD design, there are two designs: Sharp and fuzzy design. In this paper, we only discuss the sharp design case. In the sharp design, the treatment is assigned deterministically by the threshold. Let w0 be threshold on W. Our treatment variable Z is defined by

Z=I(Ww0).

We are interested in the average treatment effect (ATE) with respect to survival function.

ATE=P(T(1)>t)-P(T(0)>t).

From potential outcome structure, our observable outcome is T = ZT(1) + (1 − Z)T(0). From the structure of the sharp RD, the ATE can be expressed by the difference of survival functions in the neighborhood of w0. In the neighborhood of w0, the limits of two survival functions above and below w0 are equal to P(T(1) > t) and P(T(0) > t) with difference of treatment status and their actual survival functions P(T > t) (Zajonc, 2012). To identify the causal effect, the assumptions similar to Cho et al. (2021) are required.

  • C.1 Participants do not have perfect manipulation on the cutoff.

  • C.2 P(T(1) > t|W = w) and P(T(0) > t|W = w) are continuous on W = w0 for all t.

Condition C.1 is required to create a “randomized environment” in the cutoff. Condition C.2 shows that the potential survival function for treatment and control given W is smooth under W = w0. With these conditions, we can identify the average causal effect of the survival function

τ(t)=P(T(1)>tW=w0)-P(T(0)>tW=w0)=limww0P(T(1)>tW=w)-limww0P(T(0)>tW=w)=limww0P(T(1)>tW=w,Z=1)-limww0P(T(0)>tW=w,Z=0)=limww0P(T>tW=w)-limww0P(T>tW=w)=limww0E{I(T>tW=w)}-limww0E{I(T>tW=w)}.

The full data is (Ti,Wi,Zi)i=1n, the i.i.d copies of (T, W, Z). As Hahn et al. (1999) point out, boundary observations leads poor numerical results for computing estimator and perform inference. To address this issue, local linear regression method (Fan and Gijbels, 1996) is widely used in RD design. We apply the Brier score (Brier, 1950), a squared error loss for probability to local linear regression. Let V(t) = I(T > t), K(·) be kernel function and h be bandwidth. Define {αR(t), βR(t)} and {αL(t), βL(t)} as regression parameters correponding to local linear regression of right and left limits. Then the proposed loss functions are

UR(αR(t),βR(t))=i=1nI(Wiw0){Vi(t)-αR(t)-βR(t)(Wi-w0)}2K(Wi-w0h),UL(αL(t),βL(t))=i=1nI(Wi<w0){Vi(t)-αL(t)-βL(t)(Wi-w0)}2K(Wi-w0h).

where Vi(t) = I(Ti > t). Then by minimization of UR and UL, we obtain {αR(t),βR(t), αL(t),βL(t)}. Then the corresponding estimator for SRD is

τ^(t)=α^R(t)-α^L(t).

For this approach, we can use standard methods of RD methodology proposed by the aforementioned literatures. Our outcome is I(T > t), and we apply local linear regression.

Now we introduce censoring in our data. Let C be time to the censoring and ab be minimum a and b where a, b ∈ 꽍. The observed data is i.i.d (Ti, Δi,Wi, Zi) where Ti = TiCi and Δi = I(TiCi). For censored data case, it is very difficult to use response directly because P(T > t) ≠ P(T > t). By adapting idea from Cho et al. (2021), our goal is to find time-dependent transformation qt such that E{qt(T|W)} = E(I(T > t)|W) = P(T > t|W). The transformation qt is also censoring unbiased transformation (Fan and Gijbels, 1994; Rubin and van der Laan, 2007; Cho et al., 2021). The widely used function qt is inverse probability censoring weighted (IPCW) method.

E[Δ·I(T>t)G(TW)]=[P(CTW)G(TW)]E{I(T>tX)}=P(T>tW).

Hence we obtain unbiased estimator of P(T > t|W). The transformation (2.1) is

VIPCW1(t)=ΔV(t)G(TW).

However, this approach only uses data with uncensored observation (i.e., Δ = 1). To use more information than VIPCW1 (t), Gref et al. (1999) and Lostritto et al. (2012) use truncation on T and Δ. We consider T(t) = Tt, T (t) = T(t) ∧ C and Δ(t) = I(T(t) ≤ C). Throughout the truncation, Δ = 1 implies Δ(t) = 1, but Δ = 0 may lead Δ(t) = 1. From the idea of Gref et al. (1999) and Lostritto et al. (2012), we propose

VIPCW2(t)=Δ(t)V(t)G(T(t)W).

Due to the aforementioned reasoning, VIPCW2 (t) has smaller variance than VIPCW1 (t) because it uses more information than VIPCW1 (t). These approaches are easy to implement and supported by asymptotic theory (e.g., Strawderman, 2000). For example, VIPCW2 (t) is uniform consistent with regard to mean square error (Gerds and Schumacher, 2006) and has good performance compared to VIPCW1 (t) (Cho et al., 2022).

However, these approaches require that censoring distribution is correct and these approaches yield inefficient estimators because we only use Δ = 1 or Δ(t) = 1. Motivated by semiparametric efficiency theory (Robins et al., 1994;Tsiatis, 2007), we obtain transformation

VDR(t)=ΔV(t)G(TW)+0tQ(t,uW)G(uW)dMG(uW)=Δ(t)V(t)G(T(t)W)+0T˜(t)Q(t,uW)G(uW)dMG(uW),

where λG(s|W) is the true hazard function of G given W and

MG(uW)=I(T˜u,Δ=0)-0uI(T˜s)λG(sW)dsQ(t,uW)=P(TtW)P(TuW), 듼 듼 듼tu.

The equality of last two terms in (2.4) is proved in Cho et al. (2020). Note that Q involves the modeling survival function. The (2.4) is doubly robust because it is required to be either model for G(·) or survival model correct, but not necessarily both. It guarantees that E(VDR(t)|W = w) = P(T > t|W = w) for any given w, so it is censoring unbiased transformation. This doubly robust transformation is a combination of the IPCW term and the mean zero martingale transformation term. This martingale transformation term utilizes censored data, which yields more efficiency than VIPCW1 (t) and VIPCW2 (t), but it has fairly similar performance or slightly better efficiency than VIPCW2 (t) (Cho et al., 2022).

3. Proposed method

3.1. Causal effect estimation and inference with IPCW and DR estimators

Let VIPCW1,i(t), VIPCW2,i(t) and VDR,i(t) be the transformed response for ith observation using (2.2), (2.3) and (2.4). Denote Ni,G(u) = I(Tiu, Δi = 0). Then we can express VIPCW1,i(t), VIPCW2,i(t) and VDR,i(t) by

V^IPCW1,i(t)=ΔiI(T˜it)G^(T˜i),V^IPCW2,i(t)=Δi(t)I(T˜it)G^(T˜i(t)),V^DR,i(t)=Δi(t)I(T˜it)G^(T˜i(t))+0T˜i(t)Q^(t,uWi)G^(u)dM^i,G(u),

where

Mi,G(u)=Ni,G(u)-0uI(T˜is)dΛ^G(s)ds,

and ΛG(s) is the estimated cumulative hazard function with respect to C, and Q(t, u|Wi) is the estimator of Q(t, u|Wi). For the calculation of , due to the random censoring assumption, we use the Kaplan-Meier estimator. Hence we compute ΛG by Nelson-Aalen estimator with respect to C. We use various survival models for Q such as the Cox model (Cox, 1972) and the parametric accelerated failure time (AFT) model.

After the calculation of VIPCW1,i(t), VIPCW2,i(t) or VDR,i(t), we use a local linear apporach from Fan and Gijbels (1996) on transformed response as Cho et al. (2021). Let VCUT,i be one of VIPCW1,i(t), VIPCW2,i(t) or VDR,i(t) and K(·) be a kernel function and h be bandwidth. Suppose that the h is given. We will explain bandwidth estimation in the next section. As Imbens and Lemieux (2008), we build two loss functions

UR(αR(t),βR(t);G^,S^)=i=1nI(Wiw0){V^CUT,i(t)-αR(t)-βR(t)(Wi-w0)}2K(Wi-w0h)UL(αL(t),βL(t);G^,S^)=i=1nI(Wi<w0){V^CUT,i(t)-αL(t)-βL(t)(Wi-w0)}2K(Wi-w0h).

Then we calculate {αR(t), βR(t), αL(t), βL(t)} which minimize {UR(αR(t), βR(t); , ),UL(αL(t), βL(t); , )} by weighted least squares. Let αR,DR(t),βR,DR(t), αL,DR(t),βL,DR(t) be minimizer of (3.1) with VCUT,i(t) = VDR,i(t). We can similarly define

{α^R,IPCW1(t),β^R,IPCW1(t),α^L,IPCW1(t),β^L,IPCW1(t)}{α^R,IPCW2(t),β^R,IPCW2(t),α^L,IPCW2(t),β^L,IPCW2(t)}.

In this case, for simplicity, we suppress notations and . Then our sharp RD estimator is obtained by

τ^IPCW1(t)=α^R,IPCW1(t)-α^L,IPCW1(t),τ^IPCW2(t)=α^R,IPCW2(t)-α^L,IPCW2(t),τ^DR(t)=α^R,DR(t)-α^L,DR(t).

We adapt the method from Cho et al. (2021) for survival functions. Our VCUT,i is the new response for the regression and we adapt the idea of the Brier score for estimation. When Q = 0, V DR,i(t) reduces either VICPW1,i(t) or VICPW2,i(t). Furthermore, when there is no censoring, VICPW1,i(t) and VICPW2,i reduce to Vi(t). This is an advantage of our method: Our method is not only restricted to uncensored data but also can be applied to censored data. Our method bridges between censored and uncensored data, which does not happen in the typical survival analysis modeling.

Our approach is different from Cho et al. (2021) in several ways. First, Cho et al. (2021) use log T to estimate causal effects. This estimation is useful, but interpretation with logarithm may be practically not easy. We express causal effects in terms of survival function, and it is more relevant in practice. Moreover, since our causal effect depends on time t, we allow different causal effect in each time, while Cho et al. (2021)’s method does not allow it. Hence our approach is more flexible than Cho et al. (2021)’s method.

Remark

We observe the same phenomenon in Cho et al. (2021); in their paper, when E(log(T)|Tu,W) is 0, their DR-transformed outcome reduces to an IPCW-transformed outcome, and when additionally no censoring exists, the IPCW-transformed outcome reduces to the usual continuous outcome.

Due to the nature of our estimation procedure, we can adapt the result of asymptotic results from the mean of the logarithm of survival time in Cho et al. (2021). We show that our estimators are asymptotically normal in Appendix A (see Theorem 1 and proof in the Appendix A.).

Now we discuss the estimation of bandwidth for τ(t). First, we apply the method from Ludwig and Miller (2007), which is also used in Cho et al. (2021). The criterion proposed by Ludwig and Miller (2007) uses empirical distributions Wi with Wi < w0 and Wi with Wiw0. Then with values of Wi from these empirical distributions, we compute squared error from transformed response and corresponding our DR estimator with some range of W.

This method is simple and does not depend on the variance of the transformed response. First, define αL(ξ) to be the ξ quantile of the empirical distribution of W using observations Wi < w0 and let αR(1 − ξ) be the 1 − ξ quantile of the empirical distribution of W using observations Wiw0. Then we compute the following quantity:

CVDR,LM(h;G^,S^)=1na^L(ξ)Wia^R(1-ξ)(V^DR,i(t)-γ^DR(h,t;Wi))2,

where

γ^DR(h,tw)={α^L,DR(h,t;w),if w<w0,α^R,DR(h,t;w),if ww0.

In other words, for given h, we first compute the estimator of αL,DR(t) and αR,DR(t) with some truncated range of Wi, and secondly calculate the average of squared error with regards to αL,DR(h, t;w) and αR,DR(h, t;w), respectively. We then choose

h^DR(t,G^,S^)=argminhCVDR,LM(h,t;G^,S^).

We can derive a similar quantity for VIPCW1,i(), VIPCW2,i(), i = 1, . . . , n. The second quantity is the mean squared error criterion for the RD estimator by Imbens and Kalyanaraman (2012). Since local linear (or polynomial) regression gives a bias to the estimator, it is desirable to select bandwidth that minimizes the mean squared error. In this case, the target quantity is the mean squared error of the sharp RD estimator. In the uncensored data, for sharp RD estimator τh, which is τ with given h and true value τ0, we want to find h to minimize

E{(τ^h-τ0)2}.

Imbens and Kalyanaraman (2012) provide asymptotic results for the expansion of (3.5). With uncensored data, let Y be an outcome, σ+2(w0) and σ-2(w0) be the right and left limits of Var(Y|W = w) on the threshold w0 and m+(2) and m-(2) be the right and left hand limits of the second derivative of E(Y|W = w) on threshold w0. They propose the bandwidth selection approach by

hMSE,IK=CK{σ+2(w0)+σ-2(w0)g(w0)(m+(2)(w0)-m-(2)(w0))2}n-15,

where CK is constant from the function of kernel function and g is the density function of W. This approach provides an estimator whose the mean square error is asymptotically optimal and it is widely used in the practice (Calonico et al., 2020). However, this method requires estimation of g, σ+2 and σ-2, which is not preferred. Calonico et al. (2014) provide an alternative expansion of (3.5)

E{τ^h-τ0}2=h4(2+op(1))+1nh(V+op(1)),

where and 꽟 is variance and bias of τh. They suggest that optimal bandwidth based on the mean square error in (3.6) by

hMSE,C=(V42)15n-15.

Then we can compute this hMS E,C by following steps:

  • Step 1: Take initial bandwidths to compute 꽟 and . For this, one can use Silverman’s rule of thumb (Silverman, 1986).

  • Step 2: By using (3.6) and (3.7), compute the final bandwidth hMS E,C.

We use IPCW and DR estimators in the place of τh and compute hMS E,C, and use them in the inference.

Now we want to perform inference for the proposed sharp RD estimators. To compute variance, we adapt the approaches of Cho et al. (2021). We only propose the method based on DR transformation; the derivation of IPCW1 and IPCW2 are similar. Suppose that h is computed by Ludwig and Miller (2007) or the mean square error criterion by Calonico et al. (2014).

For variance estimation, we propose an estimation adapted by Cho et al. (2021). By applying asymptotic result in the Appendix A (Cho et al., 2021), given h, the asymptotic variance of τDR(t) is

SRDDR(G0,S*)(t)=1ne1T(Γh+-1φVV+,DR(t)Γh+-1+Γh--1φVV-,DR(t)Γh--1)e1,

where Γh+ and Γh, φVV+,DR(t) and φVV−,DR(t) are defined in Appendix, and e1 = (1, 0)T .

It is crucial to estimate φVV+,DR(t) and φVV−,DR(t) for variance estimation. As discussed in Cho et al. (2021), we use two estimation methods for φVV+,DR(t) and φVV−,DR(t) : Plug-in and nearest neighbor (NN) methods (Calonico et al., 2014). In the plug-in approach, we define usual residuals from DR transformed outcome and {αR,DR(t), αL,DR(t)}. Then by using these residuals to estimate φVV+,DR(t) and φVV−,DR(t) and compute empirical version of S R D D R ( G 0 , S * ) ( t ). NN method uses residuals by distance in each observation i. To reduce the influence of outliers in variance estimation, we define the closest values on each individual transformed DR outcome. Then we compute residuals based on the average of the closest values and the DR outcome, and then estimate S R D D R ( G 0 , S * ) ( t ). We can also apply these two approaches to IPCW1 and IPCW2 estimators. Details of the derivation of the variance are shown in the Appendix B.

3.2. Causal effect estimation and inference with the pseudo-values

Now we extend this idea to another type of unbiased estimator of P(T > t|W = w) with local linear regression. One of the methods in survival data to directly model survival quantities is the pseudo-value approach (Anderson et al., 2003) for our method. Let θ be a scalar parameter. Let X1, . . . , Xn be independent and identically distributed data, and f be a function such that E{f (Xi)}= θ. Suppose that we have A1, . . . , An independent and identically distributed covariates. We define conditional expectation θi = E{f (Xi)|Ai}, which gives θ when we integrate θi with respect to Ai. Then we define pseudo-value for ith observation by

θ ^ i = n θ ^ - ( n - 1 ) θ ^ - i .

For the survival function, f (t) = I(T > t). Then our pseudo-values for fixed time t are

θ ^ i ( t ) = n θ ^ ( t ) - ( n - 1 ) θ ^ - i ( t ) ,

where θ (t) is the Kaplan-Meier estimator at time t and θi(t) is the leave-one-out Kaplan-Meier estimator.

Since the pseudo-value approach is based on conditional expectation θi, it is sensible to apply the approach in our RD design. In other words, we compute the pseudo-values and use them as outcomes for each individual. Next, we do local linear regression by plugging in θi(t) in the place of VCUT,i(t) as shown in (3.1) and (3.2), along with bandwidth selection methods in (3.8) and (3.9). Then we obtain αR(t) and αL(t), say αR,Pseudo,(t) and αL,Pseudo, (t), and our estimator is

τ ^ P s e u d o ( t ) = α ^ R , P s e u d o , h ^ ( t ) - α ^ L , P s e u d o , h ^ ( t ) .

Statistical inference corresponding to τPseudo(t) can be performed similarly to IPCW and DR estimators.

4. Simulation

We do various Monte Carlo simulations to evaluate the numerical performance of our proposed method with the finite sample. We generate W ~ Unif(0, 1). Our first model is the Cox proportional hazard model (Cox, 1972), which is

λ ( t W ) = e β I ( W 0.5 ) ,

where β = −1. Censoring variable C follows Unif(0, b) where b is taken to achieve 30% censoring rate. Sample sizes are 500 and 1000. We compute the 25th, 50th and 75th percentile of failure times by using Monte Carlo simulations. We implement and compare the performance of five methods: Two inverse probability censoring weighted (IPCW) methods and doubly robust methods proposed in Section 2 with 3 outcome regression models (Cox model, AFT model with lognormal distribution and AFT model with log-logistic distribution). We denote using transformation in (2.2) and (2.3) as IPCW1 and IPCW2. Moreover, we call doubly robust transformation in (2.4) by calculating Q by Cox model, AFT models from lognormal distribution and log-logistic distributions by DR(Cox), DR(lognorm) and DR(loglog), respectively. We also apply the pseudo-value approach (Pseudo) by Anderson et al. (2003) as described in Section 4. In each method, for bandwidth estimation, we use approaches by Ludwig and Miller (2007) (denoted as LM) and MSE optimization in Calonico et al. (2014) (denoted as MSE). For the computation of standard error, we use the plug-in (HC0) and nearest neighbor (NN) methods mentioned in the previous section. We compute bias (Bias), empirical standard deviation (ESD), mean of standard error (SEE) and 95% coverage rate (Cover). We use the rdrobust package Calonico et al. (2015) in R to compute estimators and standard errors.

The simulation result from these three conditional expectations is shown in Table 1. All estimators are nearly unbiased. DR estimators are more efficient than IPCW1 and IPCW2 estimators. The efficiency gain of DR estimators compared to IPCW1 is larger than that of IPCW2. It is interesting to note that the pseudo-value approach has the similar performance with DR estimators. All methods have satisfactory large sample properties in general although all estimators in t3 for n = 500 have a lower coverage rate compared to the nominal 95% level. However, when the sample size increases, all methods achieve the nominal coverage rate. It is interesting to notice that both ESD and SEE from the LM method are smaller than the MSE method although biases from the two bandwidth choices are not very large. The ways to estimate σ D R , + 2 ( t ; W i , G 0 , S * ) and σ D R , - 2 ( t ; W i , G 0 , S * ) do not influence the estimation of variance; both NN and HC0 provide similar values to the SEE.

As shown in the data analysis section, since our data has a high censoring rate, we run another simulation with 60% censoring rate. Table 2 shows the result. In this simulation, we only consider the 25th and 50th percentile of the failure time due to a high censoring rate. ESD and SEE are higher than the 30% censoring rate, which is sensible due to a higher loss of information than the 30% censoring case. The general trend of simulation result is similar to one in the 30% censoring rate.

Now we consider another simulation setting. Let hazard function λ(t|W) be

λ ( t W ) = 2 + β I ( W 0.5 ) ,

where β = −1. This model is the additive hazard model (Lin and Ying, 1994), which is widely used in causal inference for survival analysis. Similar to the Cox model, we generate censoring variable C following Unif(0, b), where b is chosen to achieve 30% censoring rate. As same procedure we did in the Cox model, we calculate the 25th, 50th and 75th percentiles of failure time. The simulation result is shown in Table 3. We can see a similar trend; all methods have good large sample properties. We also run simulation with 50% censoring rate. Similar to Cox model, we only consider 25th and 50th percentiles. Table 4 shows the simulation result. The result is similar to one in 30% censoring rate, except containing a higher ESD and SSE.

5. Real data analysis

We apply our method to the PLCO dataset focusing on prostate cancer to evaluate whether a PSA-based screening strategy can be a meaningful tool to diagnose survival. In this PLCO trial, from 1993 to 2001, 76,678 men were randomized to receive annual PSA screening for 6 years or no PSA screening at all. The previous study of the PLCO trial has shown that there had been no significant decrease of mortality or prostate cancer-specific incidence although detection of prostate cancer had been increased (Andriole et al., 2009). For further workup in this trial, the golden rule is PSA level 4.0ng/ml, which implies that clinicians would expect to find the difference in mortality or cancer incidence on the threshold 4.0ng/ml. The question is “Is there any difference of survival probability at the PSA level greater or equal to 4.0mg/nl and less than 4.0mg/nl?” If the difference exists, then the PSA level 4.0mg/nl can be used as threshold for further checkup and biopsy (Cho et al., 2021) As explained in Cho et al. (2021) and the Introduction section, the motivation and this trial setup naturally create the sharp RD design. Cho et al. (2021) focus on the difference in mean of logarithm of time for the occurrence of cancer, whichever comes first (first cancer occurrence) or for prostate cancer (prostate cancer occurrence). In this analysis, we want to find whether there exists a difference between disease-free survival at the PSA level 4.0mg/nl.

In this data analysis, we focus on the screening group with baseline PSA level which was measured before PSA screening. The censoring rate is 66%, so it is not possible to compute higher percentile times by Kaplan-Meier. Hence we use observed times t1 = 5, t2 = 10, t3 = 15 and t4 = 20.

We calculate the causal effect and the associated 95% confidence interval. The results are shown in Table 5 and Figure 1. The causal effect is similar between the LM and MSE methods, but the 95% confidence interval from the MSE method is wider than the LM method, as observed in the simulation studies. From the 95% confidence interval, we recognize that the IPCW1 estimator shows wide variability as simulation studies. All DR estimators and the pseudo-value based estimator have smaller variances than IPCW1 and IPCW2. All methods show that the intervals in the later time points are wider than the ones in the earlier. This is sensible due to the high rate of censoring. It is interesting to note that the 95% confidence interval IPCW2 estimator from the MSE method is much wider in the t4 than earlier time points. Since all 95% confidence intervals include 0, we can conclude that there is no significant effect of survival difference using PSA level 4.0ng/ml.

6. Conclusion

In this paper, we propose the estimation of treatment effects in the survival data for the survival function. This methodology is practically more meaningful than the one in Cho et al. (2021) in that the survival function is more relevant application than using restricted mean survival time.

We use IPCW and DR robust transformation to account for censoring. However, the survival function is between 0 and 1, so directly applying local linear regression may cause an issue because the predicted values do not belong to 0 and 1. It is an interesting future research to the survival model directly without this restriction.

As discussed in Section 3, there is a bias term involving the second derivative of limww0P(T > t|W = w) and limww0P(T > t|W = w). Let Λ0(u|W) be the covariate-dependent cumulative hazard function of T. If limit and differentiation are interchanged, since P(T > t|W = w) = e−Λ0 (u|w), the second derivative of limww0P(T > t|W = w) can be expressed

lim w w 0 [ ( t λ 0 ( t w ) ) + { λ 0 ( t w ) } 2 ] e - Λ 0 ( t w ) lim w w 0 [ ( t λ 0 ( t w ) ) + { λ 0 ( t w ) } 2 ] e - Λ 0 ( t w ) .

When (6.1) goes 0 in given t and when hna where 1/5 < a < 2/5 (Imbens and Lemieux, 2008), i.e., when we do undersmoothing, the bias term will be negligible. However, undersmoothing may cause increase in variance due to bias-variance tradeoff. Calonico et al. (2014) propose bias-corrected estimator for uncensored data. It is an interesting future research to reduce bias in the estimation of RD design for censored data.

In practice, one would like to include covariates in the modeling. In a randomized study, including covariates leads to the treatment effect being more powerful by reducing the variance of the treatment effect. In RD design, the inclusion of covariates is discussed in uncensored data (Calonico et al., 2019), but it has not been studied deeply in censored data. It would be an interesting research to adjust the effect of covariates in RD design.

In the simulation study, we have seen that the pseudo-value approach has similar performance to DR method. Moreover, in the PLCO dataset analysis, the pseudo-value method has almost identical estimated value and confidence interval to DR(Cox). It will be an interesting future research why the pseudo-value approach shows the similar performance. In the PLCO dataset, one may be interested in the analysis of prostate cancer incidence given that other cancers exist. This is competing risks setup. Our methodology can be extended to competing risks data, but more assumptions are required to identify treatment effects. One may consider using cause-specific hazard. However, in the view of practical and identifiable quantity, cumulative incidence function will be a reasonable one for identification of treatment effect. This can also be an interesting future research.

Acknowledgment
This paper was supported by Konkuk University in 2021 (2021-A019-0164).
Appendix A: Proof of Theorem 1

Let O = (T , Δ, W) and O = (Ti, Δi,Wi), i = 1, . . . , n. Define

μ + ( t ; w ) = lim w w 0 P ( T > t W = w ) ,  듼  듼  듼 μ - ( t ; w ) = lim w w 0 P ( T > t W = w ), V D R ( t ; O , G , S ) = Δ ( t ) I ( T > t ) G ( T ( t ) ) + 0 T ˜ Q ( t , u W , S ) G ( u ) d M G ( u ) , V D R * ( t ; O , G , S ) = V D R ( t ; O , G , S ) - μ + ( t ; w 0 ) - μ + ( t ; w 0 ) ( W - w 0 ) , L i h + = I ( W i w 0 ) K ( W i - w 0 h ) ,  듼  듼  듼 L i h - = I ( W i < w 0 ) K ( W i - w 0 h ) , σ D R 2 ( t ; w , G , S ) = Var ( V D R ( t ; O , G , S ) W = w ) , σ D R 2 + ( t ; w 0 , G , S ) = lim w 0 Var ( V D R ( t ; O , G , S ) W = w ) , σ D R 2 - ( t ; w 0 , G , S ) = lim w 0 Var ( V D R ( t ; O , G , S ) W = w ) .

We further define

X h = ( 1 W 1 - w 0 h 1 W 2 - w 0 h 1 W n - w 0 h ) W h + = ( I ( W 1 w 0 ) K ( W 1 - w 0 h ) 0 0 0 0 I ( W 2 w 0 ) K ( W i - w 0 h ) 0 0 0 0 0 0 I ( W n w 0 ) K ( W i - w 0 h ) ) W h - = ( I ( W 1 < w 0 ) K ( W 1 - w 0 h ) 0 0 0 0 I ( W 2 < w 0 ) K ( W i - w 0 h ) 0 0 0 0 0 0 I ( W n < w 0 ) K ( W i - w 0 h ) ) .

Then E{VDR(O;G0, S*)} = P(T > t|W) ≡ μ(t;W). Similar to Cho et al. (2021), let h be given and we assume following conditions:

  • (C1) For Ww0, let μ(t;w) be twice continuously differentiable functions. Let μ′(t;w) and μ″ (t;w) be the first and second derivatives of μ(t;w). Let μ+(t;w) and μ+(t;w) be the first and second derivatives of μ+(t;w). We define similarly to μ(t;w) and μ(t;w).

  • (C2) There exists a > 0 such that |μ+(t;w)|, |μ+(t;w)|, |μ+(t;w)| are uniformly bounded on (w0,w0 + a]. Similarly, |μ(t;w)|, |μ(t;w)|, |μ(t;w)| are uniformly bounded on [w0a,w0).

  • (C3) μ+(t;w0), μ+(t;w0), μ+(t;w0), μ(t;w0).μ(t;w0) and μ(t;w0) are finite.

  • (C4) Let g(w) be the common density of Wi. Assume that g(w) is continuous and bounded away from zero in a neighborhood of w0.

  • (C5) σ D R 2 ( w ; G 0 , S * ) are uniformly bounded in a neighborhood of w0, and σ D R 2 + ( w 0 ; G 0 , S * ) , σ D R 2 - ( w 0 ; G 0 , S * ) and are finite.

  • (C6) lim W i w 0 E [ V D R , i ( t ; O i G 0 , S * ) - μ ( t ; W i ) r W i ], r = 1, 2, 3 are finite. We assume similarly when Wiw0.

  • (C7) K is continuous and symmetric. Moreover, support of K is compact and for any u, K(u) ≥ 0.

  • (C8) The bandwidth satisfies h ~ n−1/5 where ~ indicates “asymptotically equivalent”.

  • (C9) Let Hn = op(1). Then

    E [ ( W i - w 0 h ) j 1 ( L i h + ) j 2 H n ] = O ( 1 ) ,  듼  듼  듼 j 1 = 0 , , 6 ,  듼  듼  듼 j 2 = 1 , 2 , 3.

  • (C10) is uniformly consistent to G0.

  • (C11) is uniformly consistent to S* where S* is possibly incorrect model of S .

Let G0 be the survival function of censoring from the true model and S * be the survival function of failure time, which is possibly incorrect. Let

ρ + = ( 0 u 2 K ( u ) d u ) 2 - ( 0 u 3 K ( u ) d u ) ( 0 u K ( u ) d u ) 2 { ( 0 u 2 K ( u ) d u ) ( 0 K ( u ) d u ) - ( 0 u K ( u ) d u ) 2 } , ρ - = ( - 0 u 2 K ( u ) d u ) 2 - ( - 0 u 3 K ( u ) d u ) ( - 0 u K ( u ) d u ) 2 { ( - 0 u 2 K ( u ) d u ) ( - 0 K ( u ) d u ) - ( - 0 u K ( u ) d u ) 2 } , υ + = 0 { ( 0 s 2 K ( s ) d s ) - ( 0 s K ( s ) d s ) u } 2 { K ( u ) } 2 d u g ( w 0 ) { ( 0 u 2 K ( u ) d u ) ( 0 K ( u ) d u ) - ( 0 u K ( u ) d u ) 2 } 2 , υ - = - 0 { ( - 0 s 2 K ( s ) d s ) - ( - 0 s K ( s ) d s ) u } 2 { K ( u ) } 2 d u g ( w 0 ) { ( - 0 u 2 K ( u ) d u ) ( - 0 K ( u ) d u ) - ( - 0 u K ( u ) d u ) 2 } 2 .

Moreover, since τ(t) = limww0 P(T > t|W = w) − limww P(T > t|W = w0), from our definition, τ(t) = μ+(t;w0) − μ(t;w0). Define

( t ) = ρ + μ + ( t ; w 0 ) - ρ - μ - ( t ; w 0 ) , Σ I P C W 1 ( t ; G ) = υ + σ I P C W 1 2 + ( t ; w 0 , G ) + υ - σ I P C W 1 2 - ( t ; w 0 , G ) , Σ I P C W 2 ( t ; G ) = υ + σ I P C W 2 2 + ( t ; w 0 , G ) + υ - σ I P C W 2 2 - ( t ; w 0 , G ) , Σ D R ( t ; G , S ) = υ + σ D R 2 + ( t ; w 0 , G , S ) + υ - σ D R 2 - ( t ; w 0 , G , S ) .

We propose the following theorem.

Theorem 1

Suppose that conditions (C1)–(C11) in the Appendix hold. By extending results from Cho et al. (2021),

n h { τ ^ I P C W 1 ( t ) - τ ( t ) - ( t ) } d N ( 0 , Σ I P C W 1 ( t ; G 0 ) ) , n h { τ ^ I P C W 2 ( t ) - τ ( t ) - ( t ) } d N ( 0 , Σ I P C W 2 ( t ; G 0 ) ) n h { τ ^ D R ( t ) - τ ( t ) - ( t ) } d N ( 0 , Σ D R ( t ; G 0 , S * ) ) .

In this proof, we only show the result for the DR estimators; Results are similar to the IPCW estimators. Let W i h r = ( ( W i - w 0 ) / h ) r, r = 0, 1. Consider

A D R * , i , h + ( G , S ) = ( W i h 0 V D R * , i ( t ; O i , G , S ) L i h + W i h 1 V D R * , i ( t ; O i , G , S ) L i h + ) ,  듼  듼  듼 A D R * , i , h - ( G , S ) = ( W i h 0 V D R * , i ( t ; O i , G , S ) L i h - W i h 1 V D R * , i ( t ; O i , G , S ) L i h - ) .

Then by changing response in Cho et al. (2021) with VDR(t;O,G, S ), the following lemmas hold.

Lemma 1

Let

A D R * , h + ( t ; W i , G , S ) = E ( A D R * , i , h + ( t ; G , S ) W i ) , A D R * , h - ( t ; W i , G , S ) = E ( A D R * , i , h + ( t ; G , S ) W i ) .

Then

1 n h i = 1 n A D R * , h + ( W i ; G ^ , S ^ ) = E { 1 n h i = 1 n A D R * , i , h + ( G 0 , S * ) } + o p ( h 2 ) , 1 n h i = 1 n A D R * , h - ( t ; W i , G ^ , S ^ ) = E { 1 n h i = 1 n A D R * , i , h - ( G 0 , S * ) } + o p ( h 2 ) .

Lemma 2

Let d q = 0 u q { K ( u ) } 2 d u, q = 0, 1, 2 and

A ¯ D R * , h + ( t ; O , G ^ , S ^ ) = 1 n h i = 1 n { A D R * , i , h + ( t ; O i , G ^ , S ^ ) - A D R * , h + ( t ; W i , G ^ , S ^ ) } , A ¯ D R * , h - ( t ; O , G ^ , S ^ ) = 1 n h i = 1 n { A D R * , i , h - ( t ; O i , G ^ , S ^ ) - A D R * , h - ( t ; W i , G ^ , S ^ ) } .

Then

Var { A ¯ D R * , h + ( t ; O , G , S ) } = 1 n h σ D R 2 + ( t ; w 0 ; G 0 , S ) g ( w 0 ) D , Var { A ¯ D R * , h - ( t ; O , G , S ) } = 1 n h σ D R 2 - ( t ; w 0 , G 0 , S ) g ( w 0 ) D ,

where

D = ( d 0 + o ( 1 ) d 1 + o ( 1 ) d 1 + o ( 1 ) d 2 + o ( 1 ) ) .

Lemma 3

n h A ¯ D R * , h + ( t ; O , G 0 , S ) d { g ( w 0 ) } N ( 0 , σ D R 2 + ( t ; w 0 , G 0 , S ) D ) , n h A ¯ D R * , h - ( t ; O , G 0 , S ) d { g ( w 0 ) } N ( 0 , σ D R 2 - ( t ; w 0 , G 0 , S ) D ) .

Lemma 4

1 n h i = 1 n A D R * , i , h + ( t ; O i , G ^ , S ^ ) - 1 2 n 1 / 2 h 5 / 2 g ( w 0 ) μ + ( t ; w 0 ) δ d { g ( w 0 ) } 1 2 N ( 0 , σ D R 2 + ( t ; w 0 , G 0 , S * ) D ) , 1 n h i = 1 n A D R * , i , h - ( t ; O i , G ^ , S ^ ) - 1 2 n 1 / 2 h 5 / 2 g ( w 0 ) μ - ( t ; w 0 ) δ d { g ( w 0 ) } 1 2 N ( 0 , σ D R 2 - ( t ; w 0 , G 0 , S * ) D ) ,

where

δ = ( 0 u 2 K ( u ) d u 0 u 3 K ( u ) d u ) .

Now we prove main theorem. Define V D R ( t ; O , G ^ , S ^ ) = { V D R , i ( t ; O i , G ^ , S ^ ) } i = 1 n. Then by using matrices, we can express A D R * , i , h + ( t ; O i , G ^ , S ^ ) and A D R * , i , h - ( t ; O i , G ^ , S ^ ) by

i = 1 n A D R * , i , h + ( t ; O i , G ^ , S ^ ) = X h T W h + V D R ( t ; O , G ^ , S ^ ) , i = 1 n A D R * , i , h - ( t ; O i , G ^ , S ^ ) = X h T W h - V D R ( t ; O , G ^ , S ^ ) .

Define

ϒ = ( 0 K ( u ) d u 0 u K ( u ) d u 0 u K ( u ) d u 0 u 2 K ( u ) d u ) .

Then

n h ( α ^ R , D R ( t ) - μ + ( t ; w 0 ) β ^ R , D R ( t ) - μ + ( t ; w 0 ) ) = κ ( ( X h T W h + X h ) - 1 X h T W h + V D R ( t ; O , G ^ , S ^ ) ) ,

where κ = ( 1 0 0 h - 1 ). Then

n h ( α ^ R , D R ( t ) - μ + ( t ; w 0 ) β ^ R , D R ( t ) - μ + ( t ; w 0 ) ) - 1 2 κ - 1 ϒ - 1 μ + ( t ; w 0 ) δ d N ( 0 , σ D R 2 + ( w 0 ; G 0 , S * ) g ( w 0 ) - 1 κ - 1 ϒ - 1 V ϒ - 1 κ - 1 ) .

Then

n h ( α ^ R , D R ( t ) - μ + ( t ; w 0 ) - ρ + μ + ( t ; w 0 ) ) d N ( 0 , υ + σ D R 2 + ( t ; w 0 , G 0 , S * ) ) .

By applying similar arguments to αL,DR(t),

n h ( α ^ L , D R ( t ) - μ - ( t ; w 0 ) - ρ - μ - ( t ; w 0 ) ) d N ( 0 , υ - σ D R 2 - ( t ; w 0 , G 0 , S * ) ) .

Then due to independence of αR,DR and αL,DR, it is easy to see that

n h ( τ ^ D R ( t ) - τ ( t ) - ( t ) ) d N ( 0 , Σ D R ( t ; G 0 , S * ) ) .

Hence we obtain the result. By defining Σ I P C W 1 ( t ; G 0 ) = υ + σ I P C W 1 2 + ( t ; w 0 G 0 ) + υ + σ I P C W 1 2 - ( t ; w 0 , G 0 ) and Σ I P C W 2 ( t ; G 0 ) = υ + σ I P C W 2 2 + ( t ; w 0 , G 0 ) + υ + σ I P C W 2 2 - ( t ; w 0 , G 0 ), the proof for τIPCW1 and τIPCW2 are similar.

Appendix B: Details for estimation of variance of the proposed estimators

As previous section, let

X h = ( 1 W 1 - w 0 h 1 W n - w 0 h ) ,

and let Wh+ andWh be n × n diagonal matrices with diagnoal elements being I(Wiw0)K((Wiw0)/h), i = 1, . . . , n and I(Wi < w0)K((Wiw0)/h), i = 1, . . . , n. Let Γ h + = X h T W h + X h and Γ h - = X h T W h - X h. Moreover, define

σ D R , + 2 ( t , w ; G 0 , S * ) = Var { Δ I ( T ( 1 ) > t ) G 0 ( T ) + 0 T ( t ) Q T ( 1 ) ( u , W ; S * ) G 0 ( u ) d M G ( u ) W = w } , σ D R , - 2 ( t , w ; G 0 , S * ) = Var { Δ I ( T ( 0 ) > t ) G 0 ( T ) + 0 T ( t ) Q T ( 0 ) ( u , W ; S * ) G 0 ( u ) d M G ( u ) W = w } ,

and QT(k) (u, W; S *) = PS* (T(k)t|Tu,W), k = 0, 1. Then by applying theory in Cho et al. (2021), given h, the asymptotic variance of τDR(t) is

Σ S R D D R ( G 0 , S * ) ( t ) = 1 n e 1 T ( Γ h + + φ V V + , D R ( t ) Γ h + - 1 + Γ h - - 1 φ V V - , D R ( t ) Γ h - - 1 ) e 1 ,

where

φ V V + , D R ( t ) = 1 n i = 1 n I ( W i w 0 ) K ( W i - w 0 h ) K ( W i - w 0 h ) b i b i T σ D R , + 2 ( t ; W i , G 0 , S * ) φ V V - , D R ( t ) = 1 n i = 1 n I ( W i < w 0 ) K ( W i - w 0 h ) K ( W i - w 0 h ) b i b i T σ D R , - 2 ( t ; W i , G 0 , S * ) ,

with bi = (1, (Wiw0)/h)T . To estimate the variance, it is necessary to estimate σ D R , + 2 ( t ; W i , G 0 , S * ) and σ D R , - 2 ( t ; W i , G 0 , S * ). By adapting approaches from Calonico et al. (2014) and Cho et al. (2021), we compute residuals from transposed response and RD estimator. Let be estimated bandwidth by Ludwig and Miller (2007) or Calonico et al. (2014).

e ^ i , D R + , h ^ ( t ) = V ^ D R , i , h ^ ( t ) - α ^ R , D R , h ^ ( t ) , e ^ i , D R - , h ^ ( t ) = V ^ D R , i , h ^ ( t ) - α ^ L , D R , h ^ ( t ) ,

where VDR,i,(t) and αL,DR, (t) are VDR,i(t) and αL,DR(t) with estimated bandwidth, respectively. As can be seen, êi,DR+, (t) and êi,DR−, (t) are residuals with respect to VDR,i(t). By plugging in these two quantities to (B.1), we obtain estimators of φVV+,DR(t) and φVV−,DR(t).

φ ^ V V + , D R , h ^ p i r ( t ) = 1 n i = 1 n I ( W i w 0 ) K ( W i - w 0 h ^ ) K ( W i - w 0 h ^ ) b i b i T e ^ i , D R + , h ^ ( t ) 2 , φ ^ V V - , D R , h ^ p i r ( t ) = 1 n i = 1 n I ( W i < w 0 ) K ( W i - w 0 h ^ ) K ( W i - w 0 h ^ ) b i b i T e ^ i , D R - , h ^ ( t ) 2 .

We call this as plug-in approach. Another approach is a nonparametric nearest neighbor (NN) based variance estimator by Calonico et al. (2014). We compute residuals based on “closeness” to each observation i and use them to compute variance. This approach is more robust to outliers than the plug-in approach. We define “neighbors” in each i for VDR,i,(t). Let VDR,l+,k (i), (t) be the kth closest unit to unit i among {Wi : Wiw0} and VDR,l−,k ,(t) be the kth closest unit to unit i among {Wi : Wi < w0}, respectively. Then we compute residuals from the defined neighbors and estimate σ V V + , D R 2 ( t ; W i , G 0 , S * ) and σ V V - , D R 2 ( t ; W i , G 0 , S * ) by

σ ^ V V + , D R , h ^ 2 ( t ; W i ) = I ( W i w 0 ) K K + 1 ( V ^ D R , i , h ^ ( t ) - 1 K k = 1 K V ^ l + , k ( i ) , D R , h ^ ( t ) ) 2 , σ ^ V V - , D R , h ^ 2 ( t ; W i ) = I ( W i < w 0 ) K K + 1 ( V ^ D R , i , h ^ ( t ) - 1 K k = 1 K V ^ l - , k ( i ) , D R , h ^ ( t ) ) 2 .

From these estimators, the second method to estimate φVV+,DR(t) and φVV−,DR(t) is

φ ^ V V + , D R , h ^ n n ( t ) = 1 n i = 1 n I ( W i w 0 ) K ( W i - w 0 h ^ ) K ( W i - w 0 h ^ ) b i b i T σ ^ V V + , D R , h ^ 2 ( t ; W i ) , φ ^ Y Y - , D R , h ^ n n ( t ) = 1 n i = 1 n I ( W i < w 0 ) K ( W i - w 0 h ^ ) K ( W i - w 0 h ^ ) b i b i T σ ^ V V - , D R , h ^ 2 ( t ; W i ) .

We call this approach as the nearest neighbor method. From these two methods, we can finally derive the variance of our RD estimator. Let X ,W+, and W be Xh,Wh+, and Wh with estimated bandwidth. Then, the variance estimator is

1 n e 1 T ( Γ ^ h + - 1 φ ^ V V + , D R , h ^ ( t ) Γ ^ h ^ + - 1 + Γ ^ h ^ - - 1 φ ^ V V - , D R , h ^ ( t ) Γ ^ h ^ - - 1 ) e 1 ,

where

Γ ^ h ^ + = X h ^ T W h ^ + X h ^ ,  듼  듼  듼 Γ ^ h ^ - = X h ^ T W h ^ - X h ^ ,

and VV+,DR, (t) is either φ ^ V V + , D R , h ^ p i r ( t ) or φ ^ V V + , D R , h ^ n n ( t ). Definition of VV−,DR, (t) is similar. By using the same process, we can also estimate variance of IPCW1 and IPCW2 estimators.

Figures
Fig. 1. Estimated treatment effects (solid lines) and their 95% confidence interval (nearest neighbor method: Dased lines, plug-in method: Dotted lines). Left and right plots are methods with LM and MSE, respectively. In each plot, the colors imply as follows - Black : IPCW1, red : IPCW2, blue : DR(Cox), gold : DR(lognorm), brown : DR(loglog), darkgreen : Pseudo.
TABLES

Table 1

Simulation results with data generated from Cox model with 30% censoring rate

BiasESDSECover

LMMSELMMSELMMSELMMSE

NNHC0NNHC0NNHC0NNHC0
n = 500t1IPCW10.001−0.0040.1810.3340.1730.1710.3080.3000.9460.9400.9240.930
IPCW20.0030.0080.1050.1780.1000.0990.1790.1740.9280.9240.9440.940
DR(Cox)0.0030.0060.0960.1600.0870.0860.1560.1510.9200.9160.9400.928
DR(lognorm)0.0030.0060.0960.1600.0870.0860.1560.1510.9160.9160.9400.926
DR(loglog)0.0030.0060.0960.1600.0870.0860.1560.1510.9200.9160.9400.928
Pseudo0.0020.0080.0960.1560.0870.0860.1560.1520.9160.9160.9440.928

t2IPCW10.0080.0010.1760.3230.1730.1720.3060.2990.9500.9500.9220.928
IPCW20.0040.0130.1240.2240.1190.1180.2130.2080.9500.9360.9280.924
DR(Cox)0.0040.0150.1050.1910.1020.1010.1820.1770.9520.9460.9380.930
DR(lognorm)0.0040.0150.1050.1910.1020.1010.1820.1780.9560.9520.9380.932
DR(loglog)0.0040.0150.1050.1910.1020.1010.1820.1780.9540.9460.9400.930
Pseudo0.0040.0130.1050.1860.1020.1010.1830.1780.9540.9440.9460.936

t3IPCW1−0.007−0.0200.1630.2970.1510.1500.2630.2570.9360.9320.8840.884
IPCW2−0.007−0.0150.1290.2390.1220.1200.2140.2080.9300.9380.8940.892
DR(Cox)−0.006−0.0080.1040.1990.0970.0970.1740.1680.9380.9300.9180.900
DR(lognorm)−0.006−0.0080.1050.2000.0980.0970.1740.1690.9320.9300.9200.900
DR(loglog)−0.006−0.0080.1050.1990.0980.0970.1740.1690.9360.9320.9180.902
Pseudo−0.005−0.0110.1050.1910.0980.0970.1750.1700.9340.9320.9300.916

n = 1000t1IPCW1−0.0020.0060.1260.2380.1220.1210.2170.2150.9400.9440.9340.932
IPCW2−0.004−0.0000.0730.1300.0700.0700.1240.1230.9440.9480.9400.940
DR(Cox)−0.001−0.0020.0690.1200.0610.0610.1090.1080.9240.9260.9320.934
DR(lognorm)−0.001−0.0010.0690.1200.0610.0610.1090.1080.9220.9260.9360.936
DR(loglog)−0.001−0.0010.0690.1200.0610.0610.1090.1080.9220.9260.9340.936
Pseudo−0.001−0.0020.0690.1200.0610.0610.1090.1080.9240.9240.9320.934

t2IPCW10.0000.0030.1260.2500.1220.1210.2190.2160.9480.9440.9100.908
IPCW20.0020.0010.0820.1610.0840.0830.1500.1480.9660.9600.9380.942
DR(Cox)0.001−0.0020.0750.1360.0710.0710.1270.1250.9400.9400.9400.940
DR(lognorm)0.001−0.0020.0750.1360.0710.0710.1270.1260.9400.9380.9400.940
DR(loglog)0.001−0.0020.0740.1360.0710.0710.1270.1260.9400.9380.9420.940
Pseudo0.001−0.0020.0740.1360.0710.0710.1270.1260.9400.9380.9420.940

t3IPCW10.0030.0040.1150.2160.1080.1080.1900.1880.9420.9460.9320.920
IPCW20.0010.0020.0860.1570.0860.0850.1510.1500.9520.9440.9500.938
DR(Cox)0.000−0.0010.0720.1280.0680.0680.1210.1200.9500.9500.9360.938
DR(lognorm)0.001−0.0010.0720.1290.0690.0680.1220.1210.9440.9460.9380.942
DR(loglog)0.001−0.0010.0720.1290.0680.0680.1210.1210.9500.9460.9380.938
Pseudo0.001−0.0010.0720.1290.0690.0680.1220.1210.9440.9500.9380.934

IPCW1: (2.2), IPCW2 : (2.3), DR(Cox): (2.4) with calculating Q by Cox model, DR(lognorm): (2.4) with calculating Q with AFT model from lognormal distribution, DR(loglog): (2.4) with calculating Q with AFT model from log-logistic distribution, Pseudo : pseudo-value approach by Anderson et al. (2003), HC0: plug-in, NN: nearest neighbor, LM : bandwidth selection by Ludwig and Miller (2007), MSE : bandwidth selection by MSE optimization in Calonico et al. (2014).


Table 2

Simulation results with data generated from Cox model with 60% censoring rate

BiasESDSECover

LMMSELMMSELMMSELMMSE

NNHC0NNHC0NNHC0NNHC0
n = 500t1IPCW10.002−0.0080.3710.6980.3580.3550.6220.6010.9380.9360.9360.924
IPCW20.0020.0160.1410.2450.1310.1290.2340.2270.9340.9340.9340.932
DR(Cox)0.0050.0100.0980.1720.0920.0910.1650.1600.9220.9260.9320.930
DR(lognorm)0.0050.0100.0980.1720.0920.0910.1650.1600.9240.9280.9320.930
DR(loglog)0.0050.0100.0980.1720.0920.0910.1650.1600.9240.9280.9320.932
Pseudo0.0050.0100.0980.1720.0920.0910.1650.1600.9220.9260.9340.926

t2IPCW10.001−0.0070.4000.6830.3510.3470.5950.5760.9480.9480.9100.904
IPCW2−0.0050.0080.2170.4110.2020.2010.3580.3490.9460.9360.9460.944
DR(Cox)−0.0010.0080.1280.2380.1230.1220.2200.2140.9340.9320.9140.914
DR(lognorm)−0.0000.0090.1280.2380.1230.1220.2210.2140.9360.9340.9180.916
DR(loglog)−0.0010.0090.1280.2380.1230.1220.2200.2140.9380.9320.9100.916
Pseudo0.0000.0090.1300.2380.1240.1230.2210.2150.9340.9320.9220.920

n = 1000t1IPCW1−0.004−0.0000.2780.4910.2570.2560.4490.4450.9400.9380.9340.928
IPCW2−0.0010.0050.0950.1760.0920.0910.1630.1610.9380.9400.9440.942
DR(Cox)0.0000.0020.0700.1230.0640.0640.1140.1130.9400.9320.9480.946
DR(lognorm)0.0000.0020.0700.1230.0640.0640.1140.1130.9420.9320.9480.950
DR(loglog)0.0000.0020.0700.1230.0640.0640.1140.1130.9400.9320.9500.948
Pseudo0.0010.0020.0700.1230.0640.0640.1140.1130.9380.9320.9480.948

t2IPCW1−0.004−0.0030.2650.4750.2510.2490.4370.4320.9580.9600.9360.936
IPCW2−0.004−0.0030.1490.2810.1440.1430.2550.2520.9480.9480.9240.928
DR(Cox)−0.001−0.0040.0870.1640.0870.0860.1550.1530.9500.9500.9460.942
DR(lognorm)0.000−0.0030.0870.1640.0870.0870.1550.1530.9480.9460.9440.942
DR(loglog)−0.000−0.0030.0870.1640.0870.0860.1550.1530.9480.9500.9460.942
Pseudo−0.000−0.0040.0870.1650.0870.0870.1560.1530.9500.9500.9440.942

IPCW1: (2.2), IPCW2 : (2.3), DR(Cox): (2.4) with calculating Q by Cox model, DR(lognorm): (2.4) with calculating Q with AFT model from lognormal distribution, DR(loglog): (2.4) with calculating Q with AFT model from log-logistic distribution, Pseudo : pseudo-value approach by Anderson et al. (2003), HC0: plug-in, NN: nearest neighbor, LM : bandwidth selection by Ludwig and Miller (2007), MSE : bandwidth selection by MSE optimization in Calonico et al. (2014).


Table 3

Simulation results with data generated from additive hazard model with 30% censoring rate

BiasESDSECover

LMMSELMMSELMMSELMMSE

NNHC0NNHC0NNHC0NNHC0
n = 500t1IPCW10.002−0.0010.1750.3300.1700.1690.3030.2950.9480.9480.9220.922
IPCW20.0040.0050.1070.1790.1020.1010.1830.1770.9340.9320.9520.948
DR(Cox)0.0030.0030.0970.1590.0890.0880.1590.1540.9360.9300.9500.942
DR(lognorm)0.0030.0040.0970.1580.0890.0880.1590.1540.9360.9300.9500.944
DR(loglog)0.0030.0040.0970.1580.0890.0880.1590.1540.9360.9300.9500.942
Pseudo0.0030.0040.0970.1580.0890.0880.1590.1540.9360.9300.9500.940

t2IPCW10.0040.0100.1820.3240.1710.1700.3040.2970.9480.9380.9240.924
IPCW20.0050.0170.1320.2310.1220.1210.2190.2130.9300.9300.9300.926
DR(Cox)0.0030.0160.1170.1960.1050.1050.1880.1830.9260.9180.9440.942
DR(lognorm)0.0030.0160.1170.1960.1050.1040.1880.1830.9260.9180.9440.940
DR(loglog)0.0030.0160.1170.1960.1050.1050.1880.1830.9260.9180.9420.942
Pseudo0.0030.0170.1170.1960.1060.1050.1880.1830.9220.9160.9460.944

t3IPCW1−0.004−0.0090.1620.2920.1480.1470.2590.2530.9400.9480.8980.898
IPCW2−0.006−0.0100.1290.2420.1210.1200.2160.2100.9280.9360.9140.910
DR(Cox)−0.003−0.0060.1060.2010.1000.0990.1780.1730.9520.9500.9260.918
DR(lognorm)−0.003−0.0060.1070.2020.1000.0990.1790.1740.9500.9480.9160.920
DR(loglog)−0.002−0.0060.1070.2010.1000.0990.1780.1730.9520.9480.9180.918
Pseudo−0.002−0.0050.1070.2010.1000.0990.1780.1730.9520.9540.9220.920

n = 1000t1IPCW1−0.004−0.0040.1240.2380.1210.1200.2150.2120.9560.9480.9480.938
IPCW2−0.005−0.0050.0750.1340.0710.0710.1270.1250.9240.9240.9500.946
DR(Cox)−0.003−0.0060.0700.1210.0620.0620.1110.1100.9260.9260.9340.932
DR(lognorm)−0.003−0.0060.0700.1220.0620.0620.1110.1100.9280.9220.9380.936
DR(loglog)−0.003−0.0060.0700.1220.0620.0620.1110.1100.9260.9220.9320.934
Pseudo−0.003−0.0060.0700.1210.0620.0620.1110.1100.9260.9260.9340.936

t2IPCW1−0.000−0.0050.1290.2430.1220.1210.2180.2150.9420.9400.9260.920
IPCW20.002−0.0010.0870.1640.0860.0860.1550.1530.9620.9560.9400.942
DR(Cox)0.000−0.0060.0760.1390.0740.0730.1320.1300.9440.9420.9460.946
DR(lognorm)0.000−0.0060.0760.1390.0740.0730.1320.1300.9400.9420.9500.946
DR(loglog)0.000−0.0060.0760.1390.0740.0730.1320.1300.9440.9420.9480.948
Pseudo0.000−0.0060.0760.1390.0740.0740.1320.1300.9420.9420.9480.944

t3IPCW1−0.001−0.0080.1120.2060.1060.1060.1870.1840.9420.9360.9160.918
IPCW2−0.001−0.0060.0900.1610.0860.0860.1520.1500.9380.9420.9600.950
DR(Cox)−0.001−0.0060.0740.1310.0700.0700.1240.1230.9420.9460.9400.940
DR(lognorm)−0.001−0.0060.0760.1320.0710.0700.1250.1240.9340.9380.9420.934
DR(loglog)−0.001−0.0060.0750.1310.0700.0700.1250.1240.9380.9400.9380.936
Pseudo−0.001−0.0060.0750.1320.0700.0700.1250.1240.9400.9460.9360.938

IPCW1: (2.2), IPCW2 : (2.3), DR(Cox): (2.4) with calculating Q by Cox model, DR(lognorm): (2.4) with calculating Q with AFT model from lognormal distribution, DR(loglog): (2.4) with calculating Q with AFT model from log-logistic distribution, Pseudo : pseudo-value approach by Anderson et al. (2003), HC0: plug-in, NN: nearest neighbor, LM : bandwidth selection by Ludwig and Miller (2007), MSE : bandwidth selection by MSE optimization in Calonico et al. (2014).


Table 4

Simulation results with data generated from additive hazard model with 50% censoring rate

BiasESDSECover

LMMSELMMSELMMSELMMSE

NNHC0NNHC0NNHC0NNHC0
n = 500t1IPCW1−0.008−0.0320.3010.5350.2820.2790.4930.4770.9400.9380.9360.922
IPCW2−0.0010.0040.1280.2150.1190.1180.2130.2060.9200.9220.9540.942
DR(Cox)0.0040.0050.0980.1640.0910.0900.1630.1580.9240.9280.9380.932
DR(lognorm)0.0050.0050.0970.1630.0910.0900.1630.1580.9300.9300.9400.932
DR(loglog)0.0050.0050.0980.1630.0910.0900.1630.1580.9280.9280.9360.932
Pseudo0.0050.0050.0980.1640.0910.0900.1630.1580.9240.9300.9380.932

t2IPCW1−0.011−0.0220.2940.5220.2760.2740.4800.4660.9500.9440.9140.916
IPCW20.0030.0110.1740.3110.1590.1580.2820.2740.9240.9200.9220.928
DR(Cox)0.0020.0170.1270.2130.1150.1140.2050.2000.9240.9180.9460.942
DR(lognorm)0.0020.0160.1270.2130.1150.1140.2050.2000.9180.9160.9480.942
DR(loglog)0.0020.0170.1270.2130.1150.1140.2050.2000.9220.9200.9480.946
Pseudo0.0020.0170.1270.2130.1160.1150.2050.2000.9200.9160.9460.944

n = 1000t1IPCW1−0.0040.0010.2110.3810.2000.1990.3540.3520.9320.9360.9340.938
IPCW2−0.004−0.0050.0870.1620.0840.0830.1490.1470.9440.9400.9360.938
DR(Cox)−0.003−0.0050.0700.1230.0640.0640.1140.1120.9260.9320.9260.934
DR(lognorm)−0.003−0.0050.0700.1230.0640.0640.1140.1120.9300.9300.9260.934
DR(loglog)−0.003−0.0050.0700.1230.0640.0640.1140.1120.9260.9300.9260.934
Pseudo−0.003−0.0050.0700.1230.0640.0640.1140.1120.9280.9320.9280.934

t2IPCW1−0.001−0.0020.2100.3880.1970.1960.3470.3450.9520.9540.9260.926
IPCW2−0.002−0.0040.1170.2200.1120.1120.2010.1990.9460.9480.9400.936
DR(Cox)−0.001−0.0070.0810.1520.0810.0810.1450.1430.9500.9500.9420.942
DR(lognorm)−0.000−0.0070.0810.1520.0810.0810.1450.1430.9460.9480.9440.940
DR(loglog)−0.000−0.0070.0810.1520.0810.0810.1450.1430.9460.9500.9400.940
Pseudo−0.000−0.0070.0810.1520.0810.0810.1450.1430.9500.9460.9440.942

IPCW1: (2.2), IPCW2 : (2.3), DR(Cox): (2.4) with calculating Q by Cox model, DR(lognorm): (2.4) with calculating Q with AFT model from lognormal distribution, DR(loglog): (2.4) with calculating Q with AFT model from log-logistic distribution, Pseudo : pseudo-value approach by Anderson et al. (2003), HC0: plug-in, NN: nearest neighbor, LM : bandwidth selection by Ludwig and Miller (2007), MSE : bandwidth selection by MSE optimization in Calonico et al. (2014).


Table 5

Sharp RD design analysis result for mortality in the PLCO dataset with the screening group only

Est95% CI

LMMSELMMSE

NNHC0NNHC0
t1IPCW1−0.010−0.108(−0.249, 0.228)(−0.254, 0.234)(−0.596, 0.381)(−0.616, 0.400)
IPCW2−0.013−0.019(−0.033, 0.007)(−0.033, 0.007)(−0.049, 0.011)(−0.049, 0.010)
DR(Cox)−0.012−0.021(−0.033, 0.010)(−0.033, 0.010)(−0.051, 0.009)(−0.051, 0.008)
DR(lognorm)−0.012−0.021(−0.033, 0.010)(−0.033, 0.010)(−0.051, 0.009)(−0.051, 0.008)
DR(loglog)−0.012−0.021(−0.033, 0.010)(−0.033, 0.010)(−0.051, 0.009)(−0.051, 0.008)
Pseudo−0.012−0.021(−0.033, 0.010)(−0.033, 0.010)(−0.051, 0.009)(−0.051, 0.008)

t2IPCW1−0.020−0.101(−0.261, 0.220)(−0.267, 0.226)(−0.590, 0.387)(−0.609, 0.406)
IPCW2−0.029−0.032(−0.060, 0.003)(−0.060, 0.003)(−0.086, 0.023)(−0.086, 0.023)
DR(Cox)−0.025−0.022(−0.060, 0.009)(−0.060, 0.009)(−0.077, 0.032)(−0.076, 0.031)
DR(lognorm)−0.025−0.022(−0.060, 0.009)(−0.060, 0.009)(−0.077, 0.032)(−0.076, 0.032)
DR(loglog)−0.025−0.022(−0.060, 0.009)(−0.060, 0.009)(−0.077, 0.032)(−0.076, 0.032)
Pseudo−0.025−0.022(−0.060, 0.009)(−0.060, 0.009)(−0.077, 0.032)(−0.076, 0.031)

t3IPCW1−0.005−0.111(−0.249, 0.238)(−0.253, 0.243)(−0.602, 0.380)(−0.617, 0.396)
IPCW2−0.016−0.049(−0.060, 0.028)(−0.060, 0.028)(−0.124, 0.026)(−0.124, 0.025)
DR(Cox)−0.033−0.031(−0.083, 0.017)(−0.082, 0.016)(−0.103, 0.040)(−0.102, 0.039)
DR(lognorm)−0.033−0.031(−0.083, 0.017)(−0.082, 0.016)(−0.103, 0.040)(−0.102, 0.039)
DR(loglog)−0.033−0.031(−0.083, 0.016)(−0.082, 0.016)(−0.103, 0.040)(−0.102, 0.039)
Pseudo−0.033−0.031(−0.083, 0.017)(−0.082, 0.016)(−0.103, 0.040)(−0.102, 0.039)

t4IPCW1−0.008−0.073(−0.251, 0.235)(−0.256, 0.240)(−0.555, 0.410)(−0.571, 0.426)
IPCW20.010−0.182(−0.125, 0.145)(−0.126, 0.145)(−0.482, 0.117)(−0.484, 0.119)
DR(Cox)−0.014−0.013(−0.123, 0.094)(−0.123, 0.094)(−0.127, 0.101)(−0.127, 0.101)
DR(lognorm)−0.006−0.003(−0.116, 0.104)(−0.116, 0.104)(−0.118, 0.112)(−0.118, 0.112)
DR(loglog)−0.009−0.007(−0.118, 0.100)(−0.118, 0.100)(−0.121, 0.108)(−0.121, 0.108)
Pseudo−0.014−0.013(−0.123, 0.095)(−0.123, 0.095)(−0.127, 0.101)(−0.127, 0.101)

IPCW1: (2.2), IPCW2 : (2.3), DR(Cox): (2.4) with calculating Q by Cox model, DR(lognorm): (2.4) with calculating Q with AFT model from lognormal distribution, DR(loglog): (2.4) with calculating Q with AFT model from lognormal distribution log-logistic distributions, HC0: plug-in, NN: nearest neighbor, LM: bandwidth selection by Ludwig and Miller (2007), MSE : bandwidth selection by MSE optimization in Calonico et al. (2014).


References
  1. Andersen PK, Klein JP, and Rosthø S (2003). Generalised linear models for correlated pseudo-observations, with applications to multi-state models. Biometrika, 90, 15-27.
    CrossRef
  2. Andriole GL, Crawford ED, and Grubb RL, et al. (2009). Mortality results from a randomized prostate-cancer screening trial. New England Journal of Medicine, 360, 1310-1319.
    Pubmed KoreaMed CrossRef
  3. Bor J, Moscoe E, Mutevedzi P, Newell ML, and Bärnighausen T (2014). Regression discontinuity designs in epidemiology: Causal inference without randomized trials. Epidemiology, 25, 729-737.
    Pubmed KoreaMed CrossRef
  4. Brier GW (1950). Verification of forecasts expressed in terms of probability. Monthly Weather Review, 78, 1-3.
    CrossRef
  5. Calonico S, Cattaneo MD, and Farrell MH (2020). Optimal bandwidth choice for robust bias-corrected inference in regression discontinuity designs. The Econometrics Journal, 23, 192-210.
    CrossRef
  6. Calonico S, Cattaneo MD, and Titiunik R (2014). Robust nonparametric confidence intervals for regression-discontinuity designs. Econometrica, 82, 2295-2326.
    CrossRef
  7. Calonico S, Cattaneo MD, and Titiunik R (2015). Array. R Journal, 7, 38-51.
    CrossRef
  8. Calonico S, Cattaneo MD, Farrell MH, and Titiunik R (2019). Regression discontinuity designs using covariates. Review of Economics and Statistics, 101, 442-451.
    CrossRef
  9. Cho Y, Hu C, and Ghosh D (2021). Analysis of regression discontinuity designs using censored data. Journal of Statistical Research, 55, 225-248.
  10. Cho Y, Molinaro AM, Hu C, and Strawderman RL (2020). Regression trees for cumulative incidence functions.
  11. Cho Y, Molinaro AM, Hu C, and Strawderman RL (2022). Regression trees and ensembles for cumulative incidence functions. The International Journal of Biostatistics, 18, 397-419.
    Pubmed KoreaMed CrossRef
  12. Cox DR (1972). Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological), 34, 187-202.
    CrossRef
  13. Fan J and Gijbels I (1994). Censored regression: Local linear approximations and their applications. Journal of the American Statistical Association, 89, 560-570.
    CrossRef
  14. Fan J and Gijbels I (1996). Local Polynomial Modelling and Its Applications: Monographs on Statistics and Applied Probability 66, London, CRC Press.
  15. Gerds TA and Schumacher M (2006). Consistent estimation of the expected brier score in general survival models with right-censored event times. Biometrical Journal, 48, 1029-1040.
    Pubmed CrossRef
  16. Graf E, Schmoor C, Sauerbrei W, and Schumacher M (1999). Assessment and comparison of prognostic classification schemes for survival data. Statistics in Medicine, 18, 2529-2545.
    CrossRef
  17. Hahn J, Todd P, and Van der Klaauw W (1999) Evaluating the effect of an antidiscrimination law using a regression-discontinuity design (Technical report): National Bureau of Economic Research . Retrieved Jan. 17, 2024.
    CrossRef
  18. Hahn J, Todd P, and Van der Klaauw W (2001). Identification and estimation of treatment effects with a regression-discontinuity design. Econometrica, 69, 201-209.
    CrossRef
  19. Imbens G and Kalyanaraman K (2012). Optimal bandwidth choice for the regression discontinuity estimator. The Review of Economic Studies, 79, 933-959.
    CrossRef
  20. Zajonc T (2012). Essays on Causal Inference for Public Policy (Doctoral dissertation), Harvard University, Cambridge, MA.
  21. Imbens G and Lemieux T (2008). Regression discontinuity designs: A guide to practice. Journal of Econometrics, 142, 615-635.
    CrossRef
  22. Lee DS (2008). Randomized experiments from non-random selection in us house elections. Journal of Econometrics, 142, 675-697.
    CrossRef
  23. Lin DY and Ying Z (1994). Semiparametric analysis of the additive risk model. Biometrika, 81, 61-71.
    CrossRef
  24. Lostritto K, Strawderman RL, and Molinaro AM (2012). A partitioning deletion/substitution/addition algorithm for creating survival risk groups. Biometrics, 68, 1146-1156.
    Pubmed CrossRef
  25. Ludwig J and Miller DL (2007). Does head start improve children’s life chances? evidence from a regression discontinuity design. Quarterly Jounral of Economics, 122, 159-208.
    CrossRef
  26. Moscoe E, Bor J, and Bärnighausen T (2015). Regression discontinuity designs are underutilized in medicine, epidemiology, and public health: A review of current and best practice. Journal of Clinical Epidemiology, 68, 132-143.
    Pubmed CrossRef
  27. Prorok PC, Andriole GL, and Bresalier RS, et al. (2000). Design of the prostate, lung, colorectal and ovarian (PLCO) cancer screening trial. Controlled Clinical Trials, 21, 273S-309S.
    Pubmed CrossRef
  28. Robins JM, Rotnitzky A, and Zhao LP (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association, 89, 846-866.
    CrossRef
  29. Rubin DB (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66, 688-701.
    CrossRef
  30. Rubin D and van der Laan MJ (2007). A doubly robust censoring unbiased transformation. The International Journal of Biostatistics, 3, 1-19.
    Pubmed CrossRef
  31. Shoag J, Halpern J, Eisner B, Lee R, Mittal S, Barbieri CE, and Shoag D (2015). Efficacy of prostate-specific antigen screening: Use of regression discontinuity in the plco cancer screening trial. JAMA Oncology, 1, 984-986.
    Pubmed CrossRef
  32. Silverman BW (1986). Density Estimation for Statistics and Data Analysis, London, CRC press.
  33. Steingrimsson JA, Diao L, and Strawderman RL (2019). Censoring unbiased regression trees and ensembles. Journal of the American Statistical Association, 114, 370-383.
    Pubmed KoreaMed CrossRef
  34. Strawderman RL (2000). Estimating the mean of an increasing stochastic process at a censored stopping time. Journal of the American Statistical Association, 95, 1192-1208.
    CrossRef
  35. Thistlethwaite DL and Campbell DT (1960). Regression-discontinuity analysis: An alternative to the ex post facto experiment. Journal of Educational Psychology, 51, 309-317.
    CrossRef
  36. Tsiatis A (2007). Semiparametric Theory and Missing Data, New York, Springer Science & Business Media.