TEXT SIZE

CrossRef (0)
Estimating survival distributions for two-stage adaptive treatment strategies: A simulation study

Sifiso Vilakati1,a, Giuliana Corteseb, Thembelihle Dlaminic

aDepartment of Statistics and Demography, University of Eswatini, Eswatini;
bDepartment of Statistical Sciences, University of Padua, Italy;
cDepartment of Electrical and Electronic Engineering, University of Eswatini, Eswatini
Correspondence to: 1 Department of Statistics and Demography, University of Eswatini, Private Bag 4, Kwaluseni, M201, Eswatini. E-mail: sifemman@gmail.com
Received October 10, 2020; Revised July 14, 2021; Accepted July 15, 2021.
Abstract
Inference following two-stage adaptive designs (also known as two-stage randomization designs) with survival endpoints usually focuses on estimating and comparing survival distributions for the different treatment strategies. The aim is to identify the treatment strategy(ies) that leads to better survival of the patients. The objectives of this study were to assess the performance three commonly cited methods for estimating survival distributions in two-stage randomization designs. We review three non-parametric methods for estimating survival distributions in two-stage adaptive designs and compare their performance using simulation studies. The simulation studies show that the method based on the marginal mean model is badly affected by high censoring rates and response rate. The other two methods which are natural extensions of the Nelson-Aalen estimator and the Kaplan-Meier estimator have similar performance. These two methods yield survival estimates which have less bias and more precise than the marginal mean model even in cases of small sample sizes. The weighted versions of the Nelson-Aalen and the Kaplan-Meier estimators are less affected by high censoring rates and low response rates. The bias of the method based on the marginal mean model increases rapidly with increase in censoring rate compared to the other two methods. We apply the three methods to a leukemia clinical trial dataset and also compare the results.
Keywords : non-parametric methods, two-stage adaptive designs, survival distributions
1. Introduction

Diseases such as cancer, HIV, leukemia and depression often require combination or sequence of treatments. The aim of such sequential treatments is to prolong the survival of the patients. In studying combinations of several treatments or interventions, two-stage randomization designs have been utilized. In these designs, patients are first randomized to first stage treatments and thereafter patients are followed up for response, those who respond to the first stage treatments are then randomized to the second stage treatments. These designs are sometimes referred to as adaptive treatment strategies (Kidwell and Wahed, 2013)

Adaptive interventions, also known as dynamic treatment regimes or treatment policies use a sequence of decision rules which recommend when and how the intervention should be modified in order to optimize long term primary outcomes. These recommendations are based on factors such as individual characteristics, intermediate response collected in the course of the intervention such as the individual’s response and adherence. In adaptive treatment strategies, the intervention is personalized depending on the specific needs of the individual, secondly, the intervention is time varying, that is, the intervention is repeatedly adapted overtime in response to the participant’s performance and changing needs. An important aspect of adaptive interventions is the periodic assessment to ascertain whether the treatment selected initially is in fact helpful. If not helpful, adaptation procedures become necessary and sometimes these adjustments are done several times during the course of the treatment (Kidwell and Hyde, 2016).

One way to operationalize adaptive interventions is to use decision rules to link individual’s characteristics and ongoing performance with specific treatment options. Adjustments of intervention options are based on the individual’s values on tailoring variables. Candidates for tailoring variables depend largely on the problem at hand. An individual’s responsiveness to an intervention is considered as an important tailoring variable in many clinical trials. Response is defined based on the disease under study. For an example, in an HIV clinical trial, reaching a certain threshold in CD4 count may be regarded as a response. Alternatively, the choice of intervention options can also be tailored based on treatments already received.

A multistage randomized trial in which each stage corresponds to a decision is referred to as a SMART design. At each stage of the trial, individuals are assigned to one of the several treatment options. Data from SMART designs can be used in addressing questions concerning comparison of different interventions, they can also be used in making comparisons of different treatment regimes embedded in the SMART design (Nahum-Shani et al., 2012).

Over the past two decades, several methods for estimating and comparing survival distributions for two-stage randomization designs have been developed (Kidwell and Wahed, 2013; Guo and Tsiatis, 2005; Wahed and Tsiatis, 2004; Lunceford et al., 2002). In this article, we compare three nonparametric methods for estimating survival distributions of dynamic treatment regimes. To our knowledge, no simulation study has been done to compare the three methods yet the respective papers did not consider varied choices of the simulation parameters. For instance, the estimators described in Lunceford et al. (2002) and Guo and Tsiatis (2005) used only one censoring rate in the simulations.

2. Methods

2.1. Notation

A two-stage randomization design is characterized by two stages of randomization. We consider for simplicity two-stage randomization designs in which there are two first stage and two second stage treatments where only responders are randomized in the second stage. Let A1 and A2 be the first stage treatments, B1 and B2 be the second stage treatments. Initially patients are randomized to either A1 or A2, if the patient responds and consents further randomization is done to either B1 or B2. In cancer clinical trials, the first stage treatment is referred to as an induction therapy and the second stage treatment is referred to as maintenance therapy. The objective in these trials with survival endpoints is to estimate and compare the survival distributions under the different treatment strategies AjBk, j, k = 1, 2. The treatment policy AjBk means treating with Aj followed by Bk if patient is eligible and consents to the maintenance therapy. We define survival time to be the time from initial randomization until death.

To conceptualize the problem, potential outcomes (counterfactuals) are used (Rubin, 1974). Focus here is not on causal inference but we use counterfactuals as a vehicle to conceptualize the problem. We shall focus on data from patients who received induction therapy A1, since data from patients who received different induction therapies are independent. Data from patients who received A2 are analyzed in a similar manner. We assume that associated with patient i is a set of potential outcomes,

${Ri*,(1-Ri*) T0i,Ri*TiR,Ri*T1i*,Ri*T2i*}$

where $Ri*$ is the response status if patient i was assigned to A1. $Ri*=1$ if patient i responds to treatment A1, $Ri*=0$ otherwise. $TiR$ is the time from initial randomization to response for patient i defined only when $Ri*=1$; T0i is the survival time for a patient who do not respond to first stage treatment. $T1i*$ is the time from second randomization to death if patient i receives B1, and similarly $T2i*$ is the time from second randomization to death if patient i receives B2 instead. With these assumptions, if patient i is assigned to A1Bk, his/her survival time would be

$Tki=(1-Ri*) T0i+Ri* (TiR+Tki*), k=1,2.$

We note that we can only observe T1i or T2i hence, Tki are potential outcomes or counterfactuals. If $Ri*=0$ then T1i = T2i = T0i.

Let Tk denote the survival time for the population if all participants were assigned to the treatment strategy A1Bk. Inference on features of these distributions address directly the intent-to-treat question of interest. Using data from two-stage design we estimate the distribution of Tk.

Let Zi be the indicator for the B treatment defined only if $Ri*=1$. We have Zi = 1 if patient i is assigned to B1 and Zi = 0 if assigned to B2. Without censoring, relationship between the potential outcomes and the observed data is as follows

$Ti=(1-Ri*) T0i+Ri* {TiR+ZiT1i*+(1-Zi)T2i*}.$

For patients randomized to A2, the relationship is analogous. The observed data can be represented as a set of identically independently distributed (i.i.d) random vectors (Ri, RiZi, $RiTiR$,Ui) for i = 1, . . . , n. To introduce right censoring, let Ci be the time to censoring. With censoring, the observed data can be written as i.i.d vectors (Ri, RiZi, $RiTiR$,Ui, Δi) where Δi = I(Ti < Ci) is the failure indicator, Ui = min(Ti,Ci) is the observed time to either death or censoring. Ri = 0 if patient i is censored without having had a response to treatment A1 otherwise $Ri=Ri*$.

To estimate and compare survival distributions two assumptions are made. Firstly, we assume that Ci is conditionally independent of the other variables given the induction therapy, secondly, we assume that

$πz=P (Zi=1∣Ri=1,TiR,T1i,T2i,Ci),=P(Zi=1∣Ri=1).$

We note that πz, defined only if Ri = 1, is the probability of being randomized to the B treatment and it is typically fixed by design.

2.2. Non-parametric methods

2.2.1. LDT estimator

The marginal mean model was proposed by Lunceford et al. (2002), henceforth shall be referred to as the LDT estimator. Consider the case of estimating survival distributions for the treatment policy A1Bk, that is, S1k(t) = 1 − P(T1kt) = 1 − F1k, for k = 1, 2. Consider treatment policy A1B1 for simplicity. In two-stage randomization designs, difficulties arise from subjects who are inconsistent with the treatment policy of interest. In this case we treat them as missing. If all the patients are assigned to A1B1 and there is no censoring, meaning Ui = Ti = T1i, the natural estimator for F11(t) is $n-1∑i=1nI(Ui≤t)$. With censoring and second stage randomization upon response, only a subset of patients would have their observed survival time and actual treatment received being consistent with A1B1 since some patients are randomized to A1B2. The estimator proposed by Lunceford et al. (2002) is based on inverse probability weighting (Robins et al., 1994) to weight observations in this subset in such a way that the distribution of the weighted observations mimic that in an ideal case. Let Wki, k = 1, 2 be the weight function and for A1B1 we have W1i = 1 − Ri + RiZiz. When the ith patient is consistent with treatment policy A1B1, W1i acts as a weight. No-responders consistent with A1B1 represent themselves and they get a weight of 1, that is, W1i = 1. Each responder consistent with A1B1 represents 1z remitting or consenting individuals who could have been potentially assigned to B1 and gets a weight of 1z. Responders inconsistent with the policy A1B1 get a weight of 0. For the treatment policy A1B2, an analogous argument can be made with W2i = 1 − Ri + Ri(1 − Zi)/(1 − πz). This weighting scheme motivates the estimator

$F^1k1 (t)=n-1∑i=1nΔiWkiK^ (Ui)I(Ui≤t), k=1,2,$

where (Ui) is the Kaplan-Meier estimator for the censoring distribution given by (Ui) = Πut{1 − dNc(u)/Y(u)} with $Nc=∑i=1nI(Ui≤u,Δi=0)$ and $Y(u)=∑i=1nI(Ui≥u)$

Instead of dividing by n in Equation (2.3), a second estimator can be obtained by dividing by a probabilistically adjusted sample size

$F^1k* (t)={∑i=1nΔiWkiK^ (Ui)}-1∑i=1nΔiWkiK^ (Ui)I(Ui≤t), k=1,2.$

From Equation (2.4), the survival distributions for A1Bk are estimated using

$S^1k (t)=1-F^1k* (t),$

and the variance is estimated by

$var^(S^1k (t))=1n {1n∑i=1nΔiWkiK^ (Ui){I(Ui≤t)-F^1k*}2+∫0LdNc(u)K^ (u)Y(u)E^{L1ki* (t,u)}2},$

where L is the restricted lifetime which is smaller than the maximum follow-up of the study,

$E {L1ki* (t,u)}2=1n∑i=1nΔiK^ (Ui)[Wki{I(Ui≤t)-F^1k* (t)}-G^1k* (t,u)]2I(Ui≥u),$$G^1k* (t,u)={nS^ (u)}-1∑i=1nΔiWkiK^ (Ui) {I(Ui≤t)-F^1k* (t)}I(Ui≥u).$

More details on the variance derivation can be found on the appendix of Lunceford et al. (2002).

2.2.2. Weighted risk set estimator

The weighted risk set estimator (WRSE) (Guo and Tsiatis, 2005) is a natural extension of the Nelson-Aalen estimator. The development of the WRSE relies heavily on the counting processes. Consider a one-stage study with survival endpoints, the cumulative hazard rate can be estimated by the Aalen-Nelson estimator

$Λ^ (t)=∫0tdN(u)Y(u),$

where $N(u)=∑i=1nI(Ui≤u,Δi=1)$ denotes the number of deaths up to and including time u, and $Y(u)=∑i=1nI(Ui≥u)$ is the number of patients at risk at time u.

We show the development of the WRSE for the treatment policy A1B1, the development of the estimator for A1B2 follows similarly. Consider the case when all the patients are assigned to A1B1 in which case the observed death or censoring time is U1i = min(T1i,Ci). Let N1i(u) = I(U1iu, Δi = 1) and Y1i(u) = I(U1iu) then the cumulative hazard estimator becomes

$Λ^11 (t)=∫0t∑i=1ndN1i (u)∑i=1nY1i (u).$

In a two-stage randomization design, some of the patients who could have received B1 receive B2 after the second stage randomization. N1i(u) and Y1i(u) cannot be observed directly and the WRSE propose to incorporate inverse weighting where the weight function depending on u is defined as Wi(u) = 1 − Ri(u) + Ri(u)Ziz, where Ri(u) is the response status at time u. Ri(u) = 0, if at time u a response has not been achieved for patient i but patient i is still consistent with A1B1 and gets a weight of 1. For a patient i with Ri(u) = 1 and Zi = 0, a weight of 0 is assigned since this patient is no longer consistent with the treatment strategy A1B1. For a responder assigned to B1, this patient is consistent with A1B1 and gets a weight of 1z at time u. This patient represents 1z individuals who could have been potentially assigned to B1. The weight function $Wi* (u)=1-Ri(u)+Ri(u)(1-Zi)/(1-πz)$ is used for A1B2 and a similar argument is made.

Using the above weight function, the cumulative hazard estimator for A1B1 is

$Λ^11 (t)=∫0t∑i=1nWi(u)dNi(u)∑i=1nWi(u)Yi(u),$

where Ni(u) = I(Uiu, Δi = 1) and Yi(u) = I(Uiu). The survival estimator is

$S^11 (t)=exp {-∫0t∑i=1nWi(u)dNi(u)∑i=1nWi(u)Yi(u)}.$

The variance is given by

$Var^ (S11 (t))=n-1 {S11 (t)}2 σ^2,$

where

$σ^2=n-1∑i=1n(∫0tWi(u) [dNi(u)-Yi(u) {∑i=1nWi(u)dNi(u)∑i=1nWi(u)Yi(u)}]n-1∑i=1nWi(u)Yi(u))2$
2.2.3. Weighted Kaplan-Meier estimator

Miyahara and Wahed (2010) proposed two forms of the weighted Kaplan-Meier estimators for two-stage randomization designs. Consider the case where all patients are treated with A1Bk, the survival function at time t can be estimated as

$S^1k (t)={1,t

for k = 1, 2, where $dm=∑i=1nΔiI(Ui=tm),Ym=∑i=1nI(Ui≥tm)$ and tm are the ordered death or failure times for m = 1, 2, . . . . However, we know that in a two-stage randomization design some patients will potentially receive treatment inconsistent with A1Bk, so an adjustment for such a loss is done using inverse probability weighting. The weighted Kaplan-Meier estimator (WKME) is

$S^1kw (t)={1,t

for k = 1, 2 and where $dmw=∑i=1nΔiI(Ui=tm)Wki,Ymw=∑i=1nI(Ui≥tm)Wki$. The death process and at risk process are weighted by the inverse probability of randomization. The weights used in this estimator are the same as in the LDT estimator above. A modified version of the Greenwood formula can be used to estimate the variance of the estimator

$var^(S^1k (t))={S^1k (t)}2∑m:tm≤t1-s^1kmM^1kms^1km,$

where

$M^1km=(∑i=1nWkiI(Ui≥tm))2∑i=1n{WkiI(Ui≥tm)}2,$

and $s^1km=1-dmw/Ymw$, m = 1, 2, . . . .

A version of this estimator with time-dependent weights was also developed. For the estimator with time dependent weights, the variance was calculated using bootstrap (Johnson, 2001).

3. Results

We performed a simulation study to compare the performance of the three non-parametric methods, namely the WRSE, LDT and the WKM estimators. The aim of this simulation study is to ascertain how these methods perform when extreme values of the parameters are used. We note that in the simulation studies on WKM estimators reported in Miyahara and Wahed (2010), only two response rates were used (0.4 and 0.7) and also, in their simulation scenarios, the censoring rates were 5.4% for the 40% response rate and 6.4% for the 70% response rate. The papers on the LDT estimator (Lunceford et al., 2002) and the WRSE (Guo and Tsiatis, 2005) used only one censoring rate in their simulations. In the paper about the WRSE (Guo and Tsiatis, 2005), two response rates were used, (0.5 and 0.8). In the LDT estimator simulations, 20% and 50% were used as response rates. In real world application, one can find datasets with higher censoring rates than those considered in the simulation studies of the above papers, and perhaps higher or relatively lower response rates than the ones used in these studies. How do these methods perform when extreme values of the simulation parameters are used? How are these methods affected by higher censoring rates?

In this simulation study, we used the following censoring rates: 0%, 10%, 30%, 50% and 60%. For the response rates we used 20%, 40%, 60%, and 80%. We considered three sample sizes, that is, n = 100, 200, 400. The simulation study is done similar to that of the WRSE. The response and consent indicator, Ri, is simulated from a Bernoulli distribution with P(Ri = 1) = θ, where θ = (0.2, 0.4, 0.6, 0.8). For non-responders we simulated T0i from an exponential distribution with mean 182.5 days. T0i is generated only if Ri = 0. The time to response, $TiR$, is generated following an exponential distribution with mean 300 days. In all the simulation scenarios, the indicator for the B-treatment is generated from a Bernoulli (0.5) distribution. If Z = 1, $T1i*$ is generated from an exponential distribution with mean 370 days and if Z = 0, $T2i*$ is generated from an exponential distribution with mean 547.5 days.

The observed survival time for patient i is defined as $Ti=(1-Ri) T0i+Ri{TiR+ZiT1i*+(1-Zi)T2i*)$. The censoring time was generated from a uniform (0, v) distribution, v is chosen such that 0%, 10%, 30%, 50% and 60% of the times are censored. The observed survival time is defined to be Ui = min(Ti,Ci). For simplicity, we only calculate survival distributions for A1B1.

In all the simulation scenarios, 1000 datasets were generated and the methods were applied. We estimated S11(t) at t = 100, 300, 450 days. We report the survival probabilities together with their standard errors (SE), coverage probabilities (CP), and bias for the three methods at the times mentioned above. In addition, we also report the relative efficiency (RE) of WRSE and WKM estimators which is calculated as RE = sample variance of WRSE/sample variance of WKM. Guo and Tsiatis (2005) established that the WRSE is more efficient than the LDT estimator, so we did not repeat the calculation of the relative efficiency here.

3.1. Simulation results

We report the results of the simulation study for n = 100 in Table A.1 in the appendix, n = 200 in Table 1 and n = 400 in Table 2. Similar performances were observed for the three methods for low censoring rates. When the censoring rate is increased beyond 30%, the LDT estimator performs poorly. In all the sample sizes, the bias of the LDT estimator increases drastically as the censoring rate is increased independently of the response rate, (πr). For instance, when n = 100, (πr) = 0.4, c = 50% and t = 450 the bias for the LDT estimator is 0.24. This is relatively bigger than 0.1 which is the bias for the other two methods. This is also the case for the sample sizes of 200 and 400 with a slight decrease in the bias for the sample size of 400. Increasing the censoring rate also affects the coverage probabilities for the LDT estimator, that is the coverage probabilities are far from the nominal level for censoring rates of 30% and beyond. At the lower end of the survival curve, the LDT estimator approaches zero in cases where the response rate is low and the censoring rate is high. This is more evident when the sample size is small, that is, n = 100. The calculation of of the LDT depends on the censoring distribution. Higher values of the censoring distribution leads to a bigger denominator, hence this estimator approaches zero at the lower end when high censoring rates are used. The bias of the WRSE and the WKM estimator also increase slightly when the censoring rate is high.

The LDT estimator has the best coverage probabilities in cases where the censoring rate is low. In cases where the censoring rate is high, the coverage probabilities for the LDT estimator are far from the nominal level of 95%. The coverage probabilities of WRSE and the WKPE are close to the nominal level for almost all the simulation scenarios even when the sample size is small and the censoring rate is high. For a sample size of n = 100, and response rate of 20%, the coverage probabilities are 47.4% and 50.6% for the WRSE and the WKM estimator for the censoring rate of 60%.

The response rate affects the performance of all the three estimators. The higher the response rate the better the performance of the three estimators in terms of bias, coverage probabilities and standard errors. For the LDT estimator, this is only true when the censoring rate is low, that is, below 30%. With increase in response rates, the coverage probabilities for the WRSE and the WKM estimator get closer to the desired nominal level. For a response rate of 80% and a censoring rate of 60%, the coverage probabilities for the WRSE and the WKPM estimator are 93.0% and 85.3% respectively.

Sample size affects the performance of the three estimators in the usual way. The bias diminishes as the sample size increases and the estimators become more precise. Increasing the sample size from n = 100 to n = 400, the coverage probabilities get closer to the desired nominal level for the WRSE and the WKM estimator whilst the coverage probabilities of the LDT estimator lags behind when the censoring rate is high. At the lower end of the survival curve, the coverage probabilities for the WKM estimator are not close to the nominal level. In all the simulation scenarios, the LDT estimator has the largest standard errors. This is a result that was established in (Guo and Tsiatis, 2005). The standard errors for the WRSE and the WKM estimator are similar. There is no gain in efficiency in using the WKM estimator. The WRSE and the WKM estimator are less affected by high censoring rates and low response rates. The survival estimates from the WRSE and the WKM estimator are similar and their standard errors are similar as well. In cases where the sample size is large and the response rate is high, the WKME and the WRSE perform better in terms of biasedness, coverage probabilities and standard errors irrespective of high censoring rates. However, this is not the case with the LDT estimator as it has some larger bias and poor coverage probabilities. This is shown in the bottom of

3.2. Empirical example

The cancer and leukemia Group B 19808 (CALGB 19808) study randomized 302 patients to receive induction chemotherapy regimens consisting of cytosine arabinoside (Ara-C;A), daunorubicin (D), and etoposide (E) without (ADE) or with (ADEP) PSC-833 (P) (Kolitz et al., 2010). Patients under the age of 60 years who were newly diagnosed with acute myeloid leukemia were enrolled in the study. To be eligible, the patients should not have been previously treated for leukemia and be under the age of 60. The study was designed to compare the two induction chemotherapy regimens, ADE and ADEP, with both treatments given at their highest clinically feasible doses.

The trial suffered setbacks in the second stage due to unexpected refusals by patients or their medical doctors to comply with protocol as preplanned. Since some of the patients refused to take the second stage treatments, we regard this as withdrawn consent and are regarded as non-responders. A treatment regime, say AjBk, means treating with Aj followed by Bk if the patient is eligible and consents to next stage therapy. Not only were the refusals the reasons patients went off treatment.

The analysis of the CALGB 19808 study was done separately for the two stages (Kolitz et al., 2010, 2014). This type of analysis does not answer the question as to which treatment policy leads to a better survival. Here we show how this dataset can be analyzed using the WRSE, LDT estimator using the DTR package (Tang and Melguizo, 2015) and also using the WKM estimator.

Figure 1 shows the survival curves obtained using the WRSE, LDT and the WKM estimators. The WRSE and the WKM estimator produce similar survival curves as shown in the first two panels. The survival curves from the LDT estimator soon approach zero in the lower tail. No major differences are observed between the survival curves from the WRSE and the WKM estimator. This is clearly shown on the graph on the comparison of the three methods where the survival curves for the treatment policy ADEP-OBS are shown. Similar results were obtained for the other treatment policies which are not shown here. The difference between the survival curves from the WRSE and the WKM estimator is minimal for this treatment policy but the LDT estimator gives survival probabilities that are fairly different from the other two estimators. In this trial some patients who responded refused the second stage treatments. In this analysis these patients were treated as non-responders hence this decreased the response rate which was originally at 75%. The empirical results show that the LDT estimator is affected by low response rates.

All the three methods give survival curves that are very close to each other within the first year, the LDT estimator continues to give survival curves that are very close to each other even beyond the first year. Beyond the first year, the WRSE and WKM estimator yield survival curves that are not so close to each other. Where possible, the LDT estimator should be avoided. The LDT estimator provides unreasonable survival estimates at the tail of the distribution while the WRSE and the WKM estimator provide stable estimates at the tail of the distribution.

4. Conclusion

In this paper, we review three non-parametric estimators for survival distributions and also perform simulations to compare these estimators. All these estimators use the concept of inverse probability weighting. Patients who could have been randomized to a treatment policy of interest but end up in another treatment policy are regarded as missing in the treatment of interest. To address such missingness which happens because of the nature the trial is done, inverse weights are used. The first estimator, which we referred to as the LDT, uses time independent weights. The WRSE is an extension of the Nelson-Aalen estimator and the the WKM estimator is an extension of the Kaplan-Meier estimator. No comparative study has been done for the three methods and in this paper we provide a simulation study to compare the three methods in cases of extreme response rates and censoring rates using three sample sizes. We also provide a real data example where the three methods are applied and results compared.

All three estimators are affected by the response rate, an important aspect in the design and analysis of data from SMART designs. The higher the response rate, the better the performances of the three methods especially the WRSE and the WKPE. The LDT estimator performs well in this regard when the censoring is low on top of having a high response rate. The survival estimates from the WRSE and the WKM estimator are similar and their standard errors are similar as well.

All three estimators are affected by response rates and censoring rates. The LDT estimator is drastically affected by low response rates and high censoring rates. The other two estimators are affected by low response rates and high censoring rates but the impact is minimal. Overall the WRSE and the WKPM estimator perform better in terms of coverage probabilities, biasedness and standard errors. This study concludes that the WRSE and the WKM estimators should be preferred over the LDT estimator. The results of the simulation study and the empirical example reveal that the LDT estimator should be avoided, and more so, in cases where the censoring is above 30%. The LDT estimator provides unreasonable survival estimates at the tail of the distribution while the WRSE and the WKM estimator provide stable estimates at the tail of the distribution. The WKM estimator and the WRSE should be preferred over the LDT estimator. The WKM method is easy to understand as it is based on the widely used Kaplan-Meier method.

Figures
Fig. 1. Survival curves for the CALGB 19808 study. Panel (a), (b) and (c) plot the survival curves for the WRSE, WKM estimator and the LDT estimator. Panel (d) plots the comparison of the treatment policy ADEP-OBS for the three methods.
TABLES

Table A.1

Simulation results for n = 100

WRSEWKMLDT

πrctS1(t)SEBiasCPSEBiasCPRESEBiasCP
0.20%1000.6550.0470.0093.90.0470.0093.21.010.0500.0094.9
3000.3090.0480.0092.40.0460.0093.01.010.0540.0095.0
4500.1900.0420.0092.40.0380.0088.61.000.0480.0093.3

10%1000.6550.0480.0093.20.0480.0092.91.010.0510.0093.9
3000.3090.0500.0094.20.0470.0093.01.040.0580.0094.2
4500.1900.0450.0195.10.0410.0088.71.010.0530.0094.4

30%1000.6550.0490.0095.40.0490.0094.61.000.0520.0388.9
3000.3090.0540.0093.20.0510.0094.11.000.0570.0670.9
4500.1900.0510.0191.80.0470.0089.91.010.0470.0756.9

50%1000.6550.0510.0095.70.0510.0094.81.000.0500.1046.9
3000.3090.0660.0093.10.0650.0091.01.020.0350.2116.1
4500.1900.0660.0575.50.0670.0374.20.890.0000.190.00

60%1000.6550.0530.0094.30.0530.0093.71.000.0490.1722.3
3000.3090.0760.0383.60.0780.0181.80.890.0000.300.00
4500.1900.0760.1547.40.0780.1350.60.890.0000.190.00

0.40%1000.7320.0450.0093.50.0440.0092.31.030.0480.0095.2
3000.4250.0530.0094.60.0490.0092.01.080.0600.0095.1
4500.2950.0510.0095.30.0450.0088.71.010.0590.0095.9

10%1000.7320.0450.0093.70.0440.0091.51.030.0490.0095.0
3000.4250.0550.0094.10.0500.0091.61.020.0630.0094.7
4500.2950.0530.0093.90.0470.0087.61.010.0620.0094.3

30%1000.7320.0460.0092.60.0450.0093.71.030.0500.0191.0
3000.4250.0570.0193.70.0530.0092.11.050.0670.0483.5
4500.2950.0570.0192.80.0510.0088.31.020.0670.0576.6

50%1000.7320.0470.0094.50.0460.0091.61.040.0500.0953.3
3000.4250.0650.0191.90.0600.0092.11.020.0580.1926.6
4500.2950.0720.0188.50.0670.0182.31.010.0230.2414.7

60%1000.7320.0480.0093.70.0470.0091.91.030.0490.1429.7
3000.4250.0720.0193.00.0690.0187.71.010.0360.319.90
4500.2950.0780.0674.30.0740.0471.41.000.0000.290.00

0.60%1000.8090.0410.0094.20.0390.0090.91.050.0440.0094.7
3000.5410.0560.0093.40.0500.0091.71.040.0620.0095.0
4500.4000.0570.0193.80.0490.0088.71.020.0640.0095.5

10%1000.8090.0410.0093.20.0390.0090.71.030.0440.0093.6
3000.5410.0570.0194.20.0500.0091.01.020.0640.0094.9
4500.4000.0590.0193.50.0500.0087.51.020.0670.0094.4

30%1000.8090.0410.0093.10.0390.0091.31.040.0450.0193.2
3000.5410.0590.0093.60.0520.0089.91.010.0690.0290.1
4500.4000.0620.0193.50.0530.0087.01.000.0740.0385.3

50%1000.8090.0420.0094.00.0400.0090.01.030.0460.0476.8
3000.5410.0630.0191.90.0560.0091.41.020.0700.1157.5
4500.4000.0700.0192.30.0600.0086.41.030.0700.1545.9

60%1000.8090.0420.0092.30.0410.0090.51.020.0460.0953.0
3000.5410.0670.0192.60.0590.0087.41.010.0660.2229.2
4500.4000.0790.0190.40.0700.0084.61.010.0390.2918.0

0.80%1000.8860.0340.0091.30.0320.0093.81.020.0360.0091.9
3000.6570.0570.0093.10.0470.0087.91.010.0610.0093.9
4500.5050.0620.0093.60.0500.0086.61.000.0670.0094.4

10%1000.8860.0350.0092.40.0320.0089.11.020.0360.0093.2
3000.6570.0580.0094.80.0480.0086.71.030.0630.0094.7
4500.5050.0630.0194.40.0510.0086.11.010.0700.0095.2

30%1000.8860.0350.0090.50.0320.0088.51.030.0370.0090.8
3000.6570.0600.0094.20.0490.0086.31.010.0670.0192.9
4500.5050.0660.0193.70.0530.0085.51.000.0760.0191.4

50%1000.8860.0350.0089.60.0320.0088.61.030.0380.0185.7
3000.6570.0620.0094.90.0510.0087.21.010.0700.0677.3
4500.5050.0710.0193.60.0570.0083.61.010.0800.0871.4

60%1000.8860.0350.0091.40.0320.0088.71.020.0390.0473.7
3000.6570.0640.0093.20.0530.0085.81.030.0710.1255.2
4500.5050.0750.0193.00.0610.0085.31.030.0760.1841.0

Table 1

Simulation results for n = 200

WRSEWKMLDT

πrctS11(t)SEBiasCPSEBiasCPRESEBiasCP
0.20%1000.6550.0340.0094.50.0340.0093.91.010.0360.0095.9
3000.3090.0340.0094.10.0330.0094.81.020.0380.0096.0
4500.1900.0300.0092.10.0280.0091.81.010.0350.0095.3

10%1000.6550.0340.0094.10.0340.0094.61.020.0360.0095.0
3000.3090.0360.0095.00.0340.0093.21.010.0410.0095.6
4500.1900.0320.0094.10.0290.0093.01.010.0380.0094.3

30%1000.6550.0350.0095.30.0350.0094.81.000.0370.0285.1
3000.3090.0390.0094.50.0370.0094.71.010.0420.0564.3
4500.1900.0370.0094.20.0340.0093.61.000.0370.0653.5

50%1000.6550.0360.0093.50.0360.0093.71.010.0360.1028.6
3000.3090.0480.0194.30.0460.0093.81.020.0300.218.50
4500.1900.0540.0575.50.0540.0474.91.000.0000.190.00

60%1000.6550.0380.0094.70.0370.0093.51.010.0360.169.50
3000.3090.0600.0284.80.0620.0184.60.950.0000.300.00
4500.1900.0600.1434.50.0620.1338.50.990.0000.190.00

0.40%1000.7320.0320.0095.80.0310.0093.11.020.0340.0097.3
3000.4250.0380.0095.30.0350.0094.41.010.0430.0096.6
4500.2950.0370.0094.00.0320.0091.81.000.0420.0095.6

10%1000.7320.0320.0093.90.0310.0093.91.020.0350.0095.5
3000.4250.0390.0094.10.0360.0094.11.000.0450.0095.4
4500.2950.0380.0093.50.0330.0091.51.010.0440.0095.7

30%1000.7320.0330.0095.40.0320.0092.91.010.0360.0192.0
3000.4250.0410.0094.40.0370.0093.21.010.0490.0383.4
4500.2950.0410.0094.50.0360.0092.71.010.0500.0476.6

50%1000.7320.0340.0095.00.0330.0092.91.020.0360.0835.7
3000.4250.0470.0093.60.0430.0193.11.010.0450.1914.2
4500.2950.0550.0092.80.0500.0192.21.010.0240.2310.2

60%1000.7320.0340.0093.50.0340.0093.91.030.0360.1413.9
3000.4250.0530.0094.50.0490.0093.71.020.0340.304.80
4500.2950.0630.0572.70.0600.0471.01.010.0000.290.00

0.60%1000.8090.0290.0093.40.0280.0092.11.010.0310.0093.9
3000.5410.0400.0093.80.0350.0092.61.030.0440.0095.7
4500.4000.0410.0093.40.0380.0091.91.000.0450.0094.1

10%1000.8090.0290.0094.70.0280.0094.61.020.0310.0096.4
3000.5410.0410.0094.80.0360.0093.91.020.0460.0096.7
4500.4000.0420.0094.80.0370.0090.11.010.0480.0095.1

30%1000.8090.0290.0093.60.0280.0092.11.010.0320.0093.6
3000.5410.0420.0094.10.0370.0093.41.010.0500.0291.0
4500.4000.0450.0093.60.0380.0092.41.020.0540.0288.9

50%1000.8090.0300.0094.00.0290.0094.61.000.0330.0568.1
3000.5410.0450.0095.60.0400.0093.71.010.0520.1145.5
4500.4000.0510.0094.60.0490.0091.81.010.0550.1436.8

60%1000.8090.0300.0093.30.0290.0092.71.010.0340.0932.3
3000.5410.0480.0093.80.0420.0092.51.030.0510.2213.8
4500.4000.0590.0092.90.0500.0190.71.010.0380.289.70

0.80%1000.8860.0250.0093.90.0230.0093.91.020.0260.0093.9
3000.6570.0410.0094.90.0340.0093.31.040.0430.0094.6
4500.5050.0440.0094.20.0350.0092.21.020.0480.0094.5

10%1000.8860.0250.0092.10.0220.0091.41.030.0260.0092.6
3000.6570.0410.0094.20.0340.0093.71.020.0440.0095.0
4500.5050.0450.0094.70.0380.0092.91.010.0490.0095.4

30%1000.8860.0250.0092.40.0230.0090.31.030.0270.0092.2
3000.6570.0430.0093.30.0350.0090.51.010.0480.0193.0
4500.5050.0470.0092.60.0380.0090.01.010.0540.0191.8

50%1000.8860.0250.0092.30.0230.0091.51.040.0270.0185.3
3000.6570.0440.0094.80.0360.0093.81.020.0510.0577.4
4500.5050.0510.0094.50.0490.0093.21.010.0590.0866.7

60%1000.8860.0250.0091.60.0230.0091.81.030.0280.0368.2
3000.6570.0460.0093.40.0380.0093.01.020.0520.1143.7
4500.5050.0550.0093.00.0480.0092.81.020.0590.1632.7

Table 2

Simulation results for n = 400

WRSEWKMLDT

πrctS1(t)SEBiasCPSEBiasCPRESEBiasCP
0.20%1000.6550.0240.0095.30.0240.0094.50.990.0250.0096.2
3000.3090.0240.0095.30.0230.0092.91.010.0270.0097.4
4500.1900.0220.0096.20.0200.0092.51.010.0240.0097.1

10%1000.6550.0240.0095.20.0240.0094.51.020.0260.0096.0
3000.3090.0250.0094.20.0240.0093.51.010.0290.0095.2
4500.1900.0230.0094.90.0210.0093.41.000.0270.0094.7

30%1000.6550.0250.0095.50.0240.0095.81.020.0260.0280.8
3000.3090.0280.0094.10.0260.0094.41.010.0310.0555.9
4500.1900.0270.0093.40.0240.0093.11.010.0290.0646.1

40%1000.6550.0260.0096.10.0260.0095.21.020.0260.1010.8
3000.3090.0350.0094.40.0330.0095.01.030.0250.203.60
4500.1900.0420.0473.80.0420.0372.81.030.0000.190.00

60%1000.6550.0270.0093.90.0260.0094.31.020.0260.173.30
3000.3090.0470.0285.70.0480.0186.01.000.0000.300.00
4500.1900.0470.1322.20.0480.1326.21.000.0000.190.00

0.40%1000.7320.0230.0094.60.0220.0094.31.040.0240.0095.9
3000.4250.0270.0094.30.0250.0093.31.020.0300.0096.2
4500.2950.0260.0095.10.0230.0092.71.010.0290.0096.5

10%1000.7320.0230.0093.00.0220.0094.81.020.0250.0094.5
3000.4250.0280.0094.90.0250.0093.21.020.0320.0096.4
4500.2950.0270.0095.70.0230.0092.11.020.0320.0095.9

30%1000.7320.0230.0093.90.0230.0093.11.010.0260.0189.4
3000.4250.0290.0094.70.0270.0093.01.020.0360.0380.8
4500.2950.0300.0095.50.0260.0092.91.020.0370.0473.8

40%1000.7320.0240.0095.70.0230.0094.81.000.0260.0821.2
3000.4250.0330.0095.70.0300.0093.51.020.0340.186.05
4500.2950.0400.0192.50.0350.0092.01.030.0240.224.80

60%1000.7320.0240.0095.40.0240.0093.91.010.0260.135.30
3000.4250.0380.0094.00.0350.0090.51.020.0290.292.70
4500.2950.0510.0473.80.0470.0472.11.020.0000.290.00

0.60%1000.8090.0210.0094.80.0200.0094.51.000.0220.0095.6
3000.5410.0290.0095.50.0250.0093.21.000.0310.0096.6
4500.4000.0290.0096.40.0240.0092.41.010.0320.0096.8

10%1000.8090.0210.0095.30.0200.0094.91.010.0220.0096.3
3000.5410.0290.0094.70.0250.0093.71.020.0320.0095.7
4500.4000.0300.0093.70.0250.0092.01.020.0340.0094.7

30%1000.8090.0210.0094.30.0200.0093.01.020.0230.0194.1
3000.5410.0300.0094.00.0260.0093.01.020.0360.0290.9
4500.4000.0320.0094.90.0270.0093.01.040.0390.0288.4

50%1000.8090.0210.0095.20.0200.0094.51.010.0240.0454.3
3000.5410.0320.0095.10.0280.0093.91.030.0380.1031.0
4500.4000.0360.0095.10.0300.0093.81.060.0410.1323.0

60%1000.8090.0220.0095.20.0210.0093.91.020.0240.0914.0
3000.5410.0350.0093.40.0300.0093.11.020.0380.215.00
4500.4000.0430.0094.70.0400.0092.61.020.0340.273.00

0.80%1000.8860.0180.0092.90.0160.0092.61.000.0180.0093.2
3000.6570.0290.0094.30.0240.0093.91.010.0310.0095.2
4500.5050.0320.0095.80.0280.0091.71.010.0340.0096.2

10%1000.8860.0180.0094.10.0160.0093.51.020.0180.0095.1
3000.6570.0290.0094.50.0240.0094.81.000.0310.0095.7
4500.5050.0320.0094.70.0280.0092.90.990.0350.0094.8

30%1000.8860.0180.0094.30.0160.0094.81.010.0190.0093.7
3000.6570.0300.0095.40.0250.0094.41.010.0340.0194.5
4500.5050.0340.0095.40.0290.0093.31.000.0390.0193.2

50%1000.8860.0180.0094.10.0160.0093.91.020.0200.0183.2
3000.6570.0320.0095.80.0260.0094.31.010.0370.0565.8
4500.5050.0370.0095.00.0340.0093.81.010.0430.0757.1

60%1000.8860.0180.0094.20.0160.0093.91.050.0200.0455.8
3000.6570.0330.0094.70.0270.0091.61.020.0380.1127.9
4500.5050.0390.0096.00.0310.0090.71.030.0450.1518.9

References
1. Guo X and Tsiatis A (2005). A weighted risk set estimator for survival distributions in two-stage randomization designs with censored survival data. Int J Biostat, 1, 1-15.
2. Johnson RW (2001). An introduction to the bootstrap. Teaching Statistics, 23, 49-54.
3. Kidwell KM and Hyde LW (2016). Adaptive interventions and SMART designs: Application to child behavior research in a community setting. American Journal of Evaluation, 37, 344-363.
4. Kidwell KM and Wahed AS (2013). Weighted log-rank statistic to compare shared-path adaptive treatment strategies. Biostatistics, 1-14.
5. Kolitz JE, et al. (2010). P-glycoprotein inhibition using valspodar (PSC-833) does not improve outcomes for patients younger than age 60 years with newly diagnosed acute myeloid leukemia: Cancer and Leukemia Group B study 19808. Blood, 116, 1413-1421.
6. Kolitz JE, et al. (2014). Recombinant interleukin-2 in patients aged younger than 60 years with acute myeloid leukemia in first complete remission: Results from CALG 19808. Cancer, 120, 1010-1017.
7. Lunceford JK, Davidian M, and Tsiatis AA (2002). Estimation of survival distributions of treatment policies in two-stage randomization designs in clinical trials. Biometrics, 48-57.
8. Miyahara S and Wahed AS (2010). Weighted Kaplan–Meier estimators for two-stage treatment regimes. Statistics in medicine, 29, 2581-2591.
9. Nahum-Shani I, et al. (2012). Experimental design and primary data analysis methods for comparing adaptive interventions. Psychological methods, 17, 457.
10. Robins J, Rotnitzky A, and Zhao L (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association, 89, 846-866.
11. Rubin DB (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of educational Psychology, 66, 688.
12. Tang X and Melguizo M (2015). DTR: An R package for estimation and comparison of survival outcomes of dynamic treatment regimes. Journal of Statistical Software, 65, 1-28.
13. Wahed AS and Tsiatis AA (2004). Optimal estimator for the survival distribution and related quantities for treatment policies in two-stage randomization designs in clinical trials. Biometrics, 60, 124-133.