TEXT SIZE

CrossRef (0)
A Review of Dose Finding Methods and Theory

Ying Kuen Cheunga

aDepartment of Biostatistics, Mailman School of Public Health, Columbia University, USA
Correspondence to: 1Department of Biostatistics, Mailman School of Public Health, Columbia University, 722 West 168th Street, New York, 10032, USA. E-mail: yc632@columbia.edu
Received August 26, 2015; Revised August 31, 2015; Accepted August 31, 2015.
Abstract

In this article, we review the statistical methods and theory for dose finding in early phase clinical trials, where the primary objective is to identify an acceptable dose for further clinical investigation. The dose finding literature is initially motivated by applications in phase I clinical trials, in which dose finding is often formulated as a percentile estimation problem. We will present some important phase I methods and give an update on new theoretical developments since a recent review by >Cheung (2010), with an aim to cover a broader class of dose finding problems and to illustrate how the general dose finding theory may be applied to evaluate and improve a method. Specifically, we will illustrate theoretical techniques with some numerical results in the context of a phase I/II study that uses trinary toxicity/efficacy outcomes as basis of dose finding.

Keywords : adaptive designs, coherence, consistency, h-sensitivity, optimality bounds, phase I trials, phase I/II trials, trinary outcomes
1. Introduction

Phase I and phase I/II clinical trials of a new drug are typically small dose finding studies, whose goal is to identify an acceptable dose, defined with respect to some pre-specified toxicity and efficacy criteria, for further clinical investigation. Because little is known about the new drug in the early phase investigation, these studies are conducted in an adaptive and small-group-sequential manner, characterized by an iterative process: (1) start the trial at a safe dose, (2) treat a small group of patients at the dose, and (3) make dose decisions for the next group based on the outcomes of the current group. This process continues until a pre-specified sample size is reached. This approach, motivated by ethical considerations, avoids randomizing patients to doses with excessive toxicity risks and aims to concentrate treatments around the “good” doses.

Storer and DeMets (1987) give one of the earliest discussions on the topic of dose finding in the context of standard phase I trials, where the primary objective is to find the maximum tolerated dose based on binary toxicity outcomes. The authors formulate phase I dose finding as a percentile estimation problem, and define the maximum tolerated dose θ as a dose associated with a pre-specified toxicity probability p. Precisely,

$θ=arg mink∣π1(k)-p∣,$

where π1(k) is the probability of toxicity at dose level k. Note that for practical reasons, a typical dose finding study tests a drug at a discrete set of dose levels {1, . . . , K}; thus, the estimand θ lives in a discrete and finite parameter space {1, . . . , K}. This feature, together with the ethical considerations in human studies, differentiates the dose finding problem in clinical trials from the percentile estimation problem in bioassay (Finney, 1978). Since Storer and DeMets (1987), several authors have proposed a great variety of methods and statistical principles to estimate θ. Important examples include the up-and-down and random walk designs (Storer, 1989; Durham et al., 1997), model-based methods (O’Quigley et al.,1990; Zacks et al.,1998), stepwise procedures (Lin and Shih, 2001; Cheung, 2007) and the stochastic approximation (Anbar, 1984; Cheung and Elkind, 2010). The study of the theoretical framework for phase I dose finding, however, has not received much attention until about a decade ago. A recent review of the theoretical criteria in phase I dose finding is given in Cheung (2010).

As the dose finding objective in clinical trials becomes increasingly sophisticated, novel adaptive designs have been proposed to address estimation beyond the percentile estimation objective (1.1). Bekele and Thall (2004), Yuan et al. (2007), and Lee et al. (2011) consider phase I dose finding with non-binary toxicity outcomes. For phase I/II trials in which dose finding is based on bivariate efficacy and toxicity outcomes, several model-based methods have been proposed (e.g., Braun, 2002; O’Quigley et al., 2001; Thall and Cook, 2004). The use of utility scores has been considered in combination therapy trials (Houede et al., 2010). Most of these methods are model-based in that they sequentially treat the next patients at a dose estimated to be “good” based on a dose-outcome model. Because the dose-outcome model is usually elaborate, these methods are computer intensive, and their properties are not theoretically tractable. As a result, the theoretical study of these complex methods has been difficult and rare when compared to the standard phase I dose finding problem.

The main goal of this article is to provide an overview of dose finding methods in clinical trials and an up-to-date review of the theoretical study. We will first focus on the standard phase I dose finding problem as defined in (1.1), with Section 2 presenting important examples of statistical methods, and Section 3 reviewing key theoretical criteria. Section 4 presents a general formulation for dose finding methods in clinical trials, along with general optimality bounds. In Section 5, we will illustrate the applications of the theoretical criteria in a specific dose finding situation using trinary outcomes. This article is ended with some concluding remarks in Section 6.

2. Methods for Phase I Dose Finding Problem

### 2.1. Notation

Let xi ∈ {1, . . . , K} denote the dose level that patient i receives, and let Yi = Yi(xi) denote the patient’s binary toxicity outcome where Yi(k) is the toxicity indicator when the patient receives dose level k. In a dose finding study, each patient receives only one dose (i.e., xi) and thus only Yi is observable. Further let π1(k) = Pr{Yi(k) = 1} so that {π1(1), π1(2), . . . , π1(K)} denote the dose-toxicity curve defined on the test doses. A dose finding method consists of two components. The first component is the design that pertains to the determination of the sequence {xi}, possibly with respect to some ethical constraints due to the fact that the experimental units are human subjects. The second component is the estimation of θ as defined in (1.1). This section reviews some important classes of phase I dose finding methods.

### 2.2. Random walk

Storer (1989) describes a number of up-and-down escalation designs. For example, according to his Design B, a trial starts at a low and safe dose (e.g., x1 = 1), and a single patient is treated and observed for a toxicity outcome. The next patient will receive the next lower dose level if a toxicity is observed, and the next higher dose otherwise. Mathematically, Storer’s Design B can be represented as

$xi+1=Yi max(xi-1,1)+(1-Yi)min(xi+1,K),$

where the starting dose x1 is specified by the clinical investigators. By using Markov chain representation, the sequence {xi} according to (2.1) can be shown to sample around a dose that causes toxicity with 0.50 probability. For situations where a lower toxicity tolerance is desired, Storer (1989) proposes alternative up-and-down schemes that escalate after a small number of consecutive non-toxic outcomes, and suggests using a combination of these schemes in a study. At the end of the study, the maximum tolerated dose θ can be estimated by using logistic regression or isotonic regression.

Durham et al. (1997) introduce a randomized generalization of the up-and-down designs for any target rate p ≤ 0.50. Like Storer’s Design B, the design de-escalates dose level for the next patient if the current patient has a toxicity. When no toxicity is observed, however, the method will escalate based on the outcome of a biased coin with probability p/(1 − p). This biased coin design can be described as a random walk:

$xi+1=Yi max(xi-1,1)+(1-Yi){xi(1-ci)+min(xi+1,K)ci},$

where ci is a Bernoulli variable with probability p/(1 − p) independent of Yi. A starting dose x1 needs to be specified to initiate the recursion (2.2). The biased coin design makes dose assignments with respect to the target rate p. Exploiting the Markov property of random walk, we can derive analytically that the sequence {xi} will sample around a dose with toxicity probability about p. Thus, it is natural to estimate θ with the mode of xis, although logistic and isotonic regression may also be used for estimation purposes.

Because the escalation rules of (2.1) and (2.2) for any given dose can be written down in advance without regards to the outcomes at other doses, they can be implemented in a study without interim statistical calculations. These methods, often called algorithm-based designs, are simple to implement and are transparent to non-statistical investigators in terms of the dosing decisions.

### 2.3. Model-based designs

The continual reassessment method (CRM; O’Quigley et al., 1990) makes dose assignments based on a parametric dose-toxicity model, postulating π1(k) = F(k, β) for some parameter β. The main idea of the CRM is to treat the next subject at the dose with toxicity probability estimated to be closest to the target p. Precisely, the next dose is determined by

$xi+1=θ^i:=arg mink|F(k,β^i)-p|,$

where β̂i is the posterior mean of β based on the observations {(xj, Yj) : ji}. That is,

$β^i=∫-∞∞βLi(β)dG(β)∫-∞∞Li(β)dG(β),$

where Li(β) is the binomial likelihood of β given {(xj, Yj) : ji} and G(β) is the prior distribution of β. Under a Bayesian framework, the CRM sets the starting dose x1 = θ̂0 in accordance with (2.3) with β̂0 being the prior mean of β. The choice of G is discussed in Cheung (2011) and Lee and Cheung (2011) who study the use of least informative prior.

The CRM combines the design and estimation components of a dose finding study by setting xi+1 at θ̂i. At the end of a study with sample size n, the maximum tolerated dose θ is estimated by θ̂n, a dose that would have been given to the next patient if the study continued. This coupling of design and estimation motivates a subsequent literature of model-based methods that adopt similar ideas. In a nutshell, a model-based method makes dose decisions and estimation based on a dose-toxicity model that is being updated repeatedly throughout a study. The main difference between the various model-based methods lies in how the model F is specified and how the dose-toxicity curve π1 is estimated. Zacks et al. (1998), for instance, estimate θ using an asymmetric loss function that penalizes over-dosing more than under-dosing, while (2.3) places equal penalty on the two erroneous dosing decisions. Haines et al. (2003) consider Bayesian c-optimal designs under parametric models, whereas Gasparini and Eisele (2000) and Cheung (2002) discuss the use of Bayesian nonparametric model, and Leung and Wang (2001) use a frequentist analogue that continually estimates π1 using nonparametric isotonic regression. Meanwhile, Shen and O’Quigley (1996) and Cheung and Chappell (2002) consider robustness of parametric estimation of π1.

### 2.4. Hybrid two-stage designs

Most model-based methods are Bayesian and assume a prior distribution on the dose-toxicity curve. At the start of a trial when there is few or no data, these Bayesian methods will rely on the prior belief about θ in prescribing doses. As an alternative, O’Quigley and Shen (1996) propose a version of the CRM that estimates β using maximum likelihood estimation, that is, replacing β̂i in (2.3) with arg maxβLi(β). The likelihood CRM avoids the needs for choosing a prior. However, since maximum likelihood estimate does not exist until there is heterogeneity in the toxicity outcomes, an initial dose escalation sequence {xi0} is required before the first toxicity occurs. This leads to the concept of a hybrid two-stage design: define xi = xi0 if Yj = 0 for all j < i, and xi is defined by the model-based recommendation such as (2.3) if Yj = 1 for some j < i. Generally, a hybrid design consists of three parts: a model-based design, an initial set of escalation rules {xi0}, and a transition rule (e.g., when the first toxicity occurs). Section 3.1 gives additional discussion on the choice of {xi0}.

### 2.5. Stochastic approximation and virtual observations

Stochastic approximation is perhaps the earliest method proposed for the purpose of dose finding. Anbar (1984) consider using the Robbins-Monro method (1951) in phase I trials. Precisely, the stochastic approximation determines the next dose by

$xi+1=xi-1ib(Yi-p),$

for some constant b > 0 and a pre-specified starting dose x1. The theoretically properties of recursion (2.4) have been extensively studied, and numerous variations have been proposed. However, a few practical reasons prevent its use in phase I trials. First, while most dose finding studies are conducted on discrete dose levels, the stochastic approximation assumes a continuum of dosages. To circumvent this issue, O’Quigley and Chevret (1991) consider

$xi+1=C {xi-1ib(Yi-p)},$

where C(x) is the rounded value of x for x ∈ [0.5, K + 0.5), and is set to 1 and K respectively if x < 0.5 and > K + 0.5. However, since |(ib)−1(Yip)| < 0.5 when i is large enough, the sequence {xi} according to (2.5) will eventually stay at one dose level, i.e., xi+1 = xi = k′, which may not be θ. Cheung (2010) describes the concept of virtual observations on which a stochastic approximation recursion is based: Set $x1*=x1$, update

$xi+1*=xi*-1ib(Vi-p),$

where $Vi=Yi+b(xi*-xi)$ is the virtual observation of subject i, and treat the next patient at $xi+1=C(xi+1*)$. Because the cumulative increment ∑i(ib)−1(Vip) in (2.6) is unbounded, following the standard theory of stochastic approximation under some regularity conditions, we can show that { $xi*$} is asymptotically normal with mean φ where C(φ) = θ. In other words, the actual assigned dose sequence {xi} is consistent for θ.

Second, the choice of b is crucial to the success of the method. While one may choose adaptively based on interim data to guarantee consistency, the convergence rate is too slow to be relevant in small samples. Finally, Wu (1985) shows that the stochastic approximation is inefficient when using binary outcomes in small samples. However, an exception is when the binary outcome Yi is defined by dichotomizing a continuous outcome that is observable. For this specific situation, Cheung and Elkind (2010) show that stochastic approximation using the continuous outcomes leads to substantial improvement in efficiency, when compared to any phase I methods using the dichotomized Yis only, and offer a practical way to choose b so as to control the bias in estimation at a pre-specified level.

3. Theoretical Criteria for Phase I Methods

### 3.1. Coherence

Coherence is a dose finding principle that stipulates that no dose escalation (de-escalation) should take place for the next patient if the current subject experiences a toxic (non-toxic) outcome (Cheung, 2005). Mathematically, a dose finding design is coherent in escalation if with probability one

$PD(xi-xi-1>0∣Yi-1=1)=0,$

for all i; and is coherent in de-escalation if with probability one

$PD(xi-xi-1<0∣Yi-1=0)=0,$

for all i, where (·) denotes probability computed under the design .

Coherence is an ethical consideration for the design component of a dose finding method, and as a result, it may not correspond to efficient estimation of θ with regard to the estimation objective (1.1). Coherence is a built-in feature of many algorithm-based designs such as (2.1) and (2.2), because they would not be clinically acceptable otherwise. It is also clear that coherence holds for the stochastic approximation (2.4). In contrast, for model-based designs or hybrid designs, coherence needs to be established on a method-by-method basis. Cheung (2005) establishes the coherence of a specific version of the CRM, and shows that a hybrid two-stage design is not necessarily coherent.

Specifically, a hybrid design is not coherent if the initial sequence {xi0} is over-conservative and escalates very slowly (Cheung, 2005). Based on this interplay between the coherence principle and the initial sequence, Jia et al. (2014) subsequently show the unique existence of a most conservative, coherent initial sequence {xi0} for a given CRM model, and prescribe a conservative choice of {xi0}.

### 3.2. Optimality bounds

While there are many phase I dose finding methods (cf. Section 2), it is common to compare their performance relative to each other by simulation. However, such comparison does not allow ascertainment of their efficiency in an absolute sense. O’Quigley et al. (2002) describe a nonparametric benchmark design to assess a method’s performance relative to an objective optimality bound. To calculate the benchmark, we postulate that patient i carries a toxicity tolerance Ui that is uniformly distributed, so that Yi(k) = I{Uiπ1(k)}, where I(E) is an indicator function of the event E. Further suppose that Uis are observable for all n patients in a trial, implying that we observe a complete toxicity profile {Yi(1), Yi(2), . . . , Yi(K)} for each patient and we can estimate the dose-toxicity curve π1 based on these complete profiles. Precisely, set $π1*(k)=n-1Σi=1nYi(k)$ for all k, and estimate θ with $θ*=arg mink∣π1*(k)-p∣$.

Note that in a dose finding study, only Yi = Yi(xi) is observed for patient i instead of the complete profile. Thus, intuitively, by using additional information that is not available in practice, the “estimator” θ* is expected to perform better than any methods that use only Yis for estimation. Therefore, while θ* cannot be implemented in practice, it may serve as a yardstick for other dose finding methods.

Importantly, the properties of θ* are theoretically tractable and can be easily computed. Cheung (2013) shows that the distribution of θ* can be approximated by

$Pr(θ*≥k)≈Φ [n{2p-π1(k-1)-π1(k)+0.5n-1}σk],$

for k ≥ 2, where $σk2=π1(k-1){1-π1(k-1)}+π1(k){1-π1(k)}+2π1(k-1){1-π1(k)}$, and Φ is the standard normal distribution function. The expression (3.3) is then used to derive a lower bound of the sample size required for a certain accuracy constraint, and also a sample size formula for a version of the CRM.

### 3.3. Consistency and h-sensitivity

An estimator θ̃n is consistent for θ if and only if θ̃n = θ with probability one as n → ∞. For model-based designs that couple design and estimation, Azriel et al. (2011) prove that consistency cannot be achieved under all dose-toxicity curves. Rather, a method will achieve consistency if the design visits every dose level infinitely often (e.g., up-and-down designs). This presents a dilemma between experimentation and estimation in the context of dose finding studies, in that this “best intention” approach is at odds with the estimation objective. In other words, consistency is an infeasible goal under the ethical constraints of phase I trials.

As a middle ground, Cheung (2010) discuss the concept of h-sensitivity and indifference interval in dose finding. Briefly, the interval p ± h is called an indifference interval of a dose finding design if there is N > 0 and h ∈ (0, p) such that

$PD{π1(θ˜n)∈(p-h,p+h), for all n≥N}=1.$

For brevity of discussion, we confine our attention to the class of dose-toxicity curves such that π1(k) ∈ p ± h for some k. A design with half-width h is called a h-sensitivity design. In words, a design is h-sensitive if it will select a dose with toxicity probability on the interval p ± h. An easy corollary is that any dose finding method that provides consistent estimation of θ is also h-sensitive for any arbitrarily small h. The converse is not true: the CRM, while not consistent in the sense of Azriel et al. (2011), can be calibrated to be h-sensitive for any given h under all dose-toxicity curves (Lee and Cheung, 2009).

4. General Dose Finding Methods

### 4.1. Estimation objective function

This section extends the notation in Section 2 to cover the general dose finding settings. We consider multinomial outcome Y that takes on L+1 distinct values {ω0,ω1, . . . ,ωL}. Without loss of generality, assume that ω0 < ω1 < · · · < ωL so that the distribution of the outcome Yi(k) of patient i at dose level k can be described by the tail function

$πl(k)=Pr{Yi(k)≥ωl}, for l=1,…,L.$

We shall also use π = {πl(k) : l = 1, . . . , L; k = 1, . . . , K} to denote the entire collection of dose-outcome probabilities. Generally, the estimation component of a dose finding trial is to estimate a dose d(π) ∈ {1, . . . , K}, where d(·) is a known objective function specified in the planning stage of a trial.

Example 1

(Dose finding with binary toxicity outcomes). The outcome Y in a phase I trial is binary, i.e., L = 1, ω0 = 0 and ω1 = 1, so that π1(k) = Pr{Yi(k) = 1} denotes the probability of a toxic outcome at dose k. The estimation objective function is defined as d1(π1) = arg mink |π1(k) − p| as displayed in (1.1).

Example 2

(Dose finding with total toxicity burden). In most clinical situations, a patient may experience more than one type of toxicity at different severity grades. Instead of summarizing this toxicity profile into a binary toxicity outcome, Bekele and Thall (2004) propose to measure the total toxicity burden of a patient using a weighted sum of grades and types of toxicities: each toxicity type and grade will have a pre-determined weight, and the total burden Y of a patient will be calculated by adding the weights of toxicity grade/type the patient experiences. To illustrate the concept of total burden, consider a clinical study described in Lee et al. (2012) where neuropathy and low platelet count are the potential toxicities of a drug, with the severity weights given in Table 1. Based on these weights, a patient with a grade 2 neuropathy and a grade 1 low platelet will have a total toxicity burden Y = 0.64 + 0.17 = 0.81. While Y is a numerical score, it only takes on a finite number of possible values depending on the toxicity grades/types considered. We can enumerate from Table 1 that the burden score Y takes on 18 possible values (i.e., L = 17), which are listed in the second column of Table 2. Lee et al. (2011) study the method of Bekele-Thall that aims to estimate dBT (π) = arg mink |Eπ{Y(k)} − τ|, where τ is a pre-specified level. As τ is set to 0.72 in the paper, the true target dose dBT (π) = 3, with Eπ{Y(3)} = 0.81, in the dose-outcome scenario given in

Example 3

(Dose finding with multiple toxicity constraints). For the same clinical setting as in Example 2, Lee et al. (2011) propose an alternative estimation objective that chooses the smallest dose among doses that satisfy individual toxicity constraints. To be precise, they define dMC(π) = minj=1,...,Jθj where θj = arg mink |Pr{Y(k) ≥ tj}− pj|, for pre-specified toxicity thresholds t1 < · · · < tJ and toxicity rates p1 > · · · > pJ, and J is the number of toxicity constraints. In the special case with J = 1, the objective function reduces to (1.1). To illustrate using Table 2, suppose J = 2 and that t1 = 1, t2 = 1.5, p1 = 0.25, and p2 = 0.10. Under the scenario described in the table, we obtain θ1 = 3, θ2 = 2, and hence dMC(π) = 2.

Example 4

(Dose finding with bivariate binary outcomes). In a phase I/II trial, dose finding is based on both efficacy and toxicity outcomes. Generally, the outcome Y takes on four possible values. Using a particular convention, we may define Y = 0 for a patient without no toxicity and no response, Y = 1 for response without toxicity, Y = 2 for toxicity without response, and Y = 3 for both response and toxicity. Let FT (k) and FR(k) denote respectively the toxicity probability and response probability at dose k. Using the notation (4.1), these probabilities can be expressed in terms of π as follows: FT (k) = Pr{Yi(k) ≥ 2} = π2(k) and FR(k) = Pr{Yi(k) = 1 or Yi(k) = 3} = π1(k) − π2(k) + π3(k). The desirability δk of dose k is then defined as a function of the response and toxicity rates, that is, δk = δ{FR(k), FT (k)}, where δ(r, s) is increasing in r and decreasing s. The estimation objective function is defined as dET(π) = arg maxkδk.

### 4.2. General design strategy and optimality bound

As illustrated in the previous subsection, the estimation component of a dose finding study is highly study-specific. Examples 2 and 3, both dealing with situations with multiple toxicities, take very different objective functions and hence lead to different target doses; recall that dBT (π) = 3 and dMC(π) = 2 for Table 2. Therefore, it is very crucial to work with the clinical investigators to decide the estimation objective at the planning stage of a trial.

In contrast, the design component has predominantly taken the form of a model-based or hybrid strategy: start a trial at a safe dose, treat and observe a few patients, obtain some model-based estimate of π using on the most recent data, and treat the next group of patients based on this estimate. While this strategy is applicable to very general outcome types and estimation objective functions, the actual implementations are computer intensive and specific to the estimation objective and the assumed dose-outcome model. The performance of the dose finding method also depends on this assumed dose-outcome model and the underlying dose-outcome relationship, which can be quite complicated. Therefore, it is important to be able to derive some theoretical optimality bound for these methods as a way to ascertain their efficiencies and perform design diagnosis. Cheung (2014) extends the concept of nonparametric benchmark to the general dose finding settings. To calculate a benchmark for a trial with Y taking on L + 1 possible values, we postulate that each patient carries a tolerance profile (Ui1,Ui2, . . . ,UiL) that is uniformly distributed on [0, 1]L, so that under a given π, the outcome of patient i is determined by

$Yi(k)=ωl if Ui,l+1>πl+1(k)πl(k) and Uij≤πj(k)πj-1(k), for all j≤L,$

where we set π0(k) ≡ 1 and πL+1(k) ≡ 0 by convention. Consequently, we may estimate πl(k) by $π1*(k)=n-1Σi=1nI{Yi(k)≥ωl}$; hence d(π*) can be calculated as the benchmark for the objective d(π). Some theoretical properties of π* are given in Cheung (2014). For each given d(·) and π, the sampling distribution of θ* can be derived analytically using asymptotic approximation in principle, analogously to (3.3). In practice, the sampling properties of a benchmark can be easily approximated by generating the outcomes (4.2) using simulation.

5. An Application to Dose Finding with Trinary Outcomes

In this section, we illustrate how the theoretical criteria can be applied to build and evaluate dose finding methods in the context of a phase I/II trial where each patient will experience one of three possible outcomes: no toxicity and no response (Y = 0), response without toxicity (Y = 1), and toxicity (Y = 2). It can be viewed as a special case of Example 4 above with π3(k) ≡ 0, that is, a toxicity will preclude a response. That is, we have FR(k) = π1(k) − π2(k) and FT (k) = π2(k). For the estimation objective, we use the desirability function specified in Thall and Cook (2004):

$δ(r,s)=1-(1-r)2+s2(1-pE)2+pT2,$

where (pE, pT ) solves $(pT+0.045)pE2-0.347pE+0.147=0$ and s(pEr) + (1 − r)(pTs) = 0. The function (5.1) attains a maximum of 1 when r = 1 and s = 1, and can be negative. Thus, we consider a modified efficacy-toxicity estimation objective function defined as dET (π) = arg maxkδk if δk > 0 for some k; = 0 otherwise.

Cheung (2015) introduces the concept of a tandem dose finding design, in which patients are enrolled in pairs and treated at adjacent doses, and dose escalation between pairs are conducted in tandem. Without loss of generality, the design points will be subject to the following constraints:

$x2i=x2i-1+1, for i=1,…,n2,$

for patient 2i − 1 and patient 2i, in a trial with sample size n, i.e., n/2 pairs (when two patients are enrolled, we can pre-randomize which patient is the “first” patient in the pair). The tandem design (5.2) is intended as an escalation restriction that enables comparison of neighboring doses. For a phase I/II trial with trinary outcomes, there are 9 possible outcome configurations for each pair of subjects. And while there are many ways to specify the dose escalation rules corresponding to each of the outcome configurations, Table 3 prescribes a specific set of rules where the coherence principles are used to impose restrictions on certain rules. For example, suppose that the outcomes are Y2i−1 = 0 and Y2i = 2 for the (2i − 1)th and (2i)th patients. Coherence in de-escalation (3.2) imposes that

$x2i+1≥x2i-1$

following the outcome Y2i−1 = 0, and coherence in escalation (3.1) implies

$x2i+2≤x2i=x2i-1+1$

because Y2i = 2. It can be shown that under the constraints (5.2), (5.3), and (5.4), the only possible dose assignment for the next pair is (x2i+1, x2i+2) = (x2i−1, x2i). This example illustrates how coherence can be applied to simplify clinical decisions that are ethically defensible. This type of clarity is particularly important in trials with complex dose-outcome structures. The tandem design described above is an algorithm-based design while most dose finding methods for phase I/II trials are model-based. For example, Thall and Cook (2004) consider two dose-outcome models to model FT and FR jointly: the proportional odds (PO) model and the continuation ratio (CR) model, and apply a model-based design that treats the next group of patients at a dose with the largest model-based estimate of desirability, subject to some acceptability constraint; see Thall and Cook (2004) for further model and algorithm details.

It is typical to use simulation to compare different dose finding methods. Table 4 gives the simulation results of the tandem design, the Thall-Cook methods with the PO and the CR models, along with the benchmark dET (π*) for a trial with n = 72. The tandem design uses the escalation rules specified in Table 3 along with (5.2), and estimate dET (π) using logistic regression at the end of the trial. The results of the tandem design and the benchmark are based on 10,000 simulation replicates, and the results of the PO and the CR models are obtained from Thall and Cook (2004). The methods are compared based on an accuracy index calculated as a standardized, expected desirability of the selected dose:

$Σk=1Kδk Pr(d^=k)-mink δkmaxk δk-mink δk,$

where denotes the selected dose by a method. The index is standardized so that it equals 0 (1) when the worst (best) dose is selected with probability one (Cheung et al., 2015).

If we focus on the relative performance, we cannot tell apart the PO and the CR models because the former performs better than the latter under Scenario 3, and the converse holds under Scenario 4. This is possible because it is unlikely that a method or a model will perform uniformly better than another. However, when we look at the optimality bounds, the benchmark dET (π*) has better accuracy (0.99) in Scenario 4 than in Scenario 3 (0.97), suggesting Scenario 4 is an “easier” scenario for estimation purposes. In this light, it is interesting to note that the PO model performs poorer in an easy scenario ( = 0.90) than in a less easy one ( = 0.92), which may be indicative of an intrinsic idiosyncrasy of the PO model in this context. This idea is analogous to unbiasedness in hypothesis testing where we expect a larger power under a greater effect size. In other words, the benchmark accuracy can be viewed as a numerical summary of the “effect size” of a dose-outcome scenario, thus providing a yardstick for what we should expect from a reasonable dose finding method.

6. Discussion

In this article, we review the theoretical concepts in dose finding with some details on coherence and optimality bounds. While these concepts were first developed for the standard phase I problem, we discuss their extension to general dose finding settings. Since we aim to cover theoretical developments after 2010, we have not covered all theoretical criteria in dose finding such as rigidity and unbiasedness, for which we refer readers to Cheung (2010).

In this article, we also give some important examples of “non-standard” dose finding problems including dose finding with trinary outcomes and multiple toxicity constraints. We, however, have not covered many important topics such as the use of time-to-event outcomes (Cheung and Chappell, 2000; Polley, 2011), testing drug combinations (Houede et al., 2010), and dose finding with patient heterogeneity. However, the principles of estimation objective and design strategy can be applicable to these other situations.

Statistical methodology for dose finding trials is an application-oriented discipline. As a result, little attention has been given to studying the general theory of dose finding methods. This, however, falls short of the rigor necessitated by the ethics and scientific merits of a clinical study, and may miss the opportunity for developing powerful theory-guided design diagnostic and calibration tools. Examples abound for the standard phase I dose finding problems: Cheung (2013) uses the optimality bounds to derive a sample size formula for the CRM; Jia et al. (2014) provide tools based on the coherence principles to choose the initial sequence {xi0} in a hybrid design; and Lee and Cheung (2009) develop an algorithm to fine-tune a CRM model to achieve h-sensitivity for any given h. In this article, we demonstrate the use of optimality benchmark in a quick diagnosis of the idiosyncrasy of the PO model. These are all tangible design tools that would not be possible without careful study of dose finding theory. As dose finding trials become increasingly sophisticated, complex, and study-specific, it is ever more important to be able to have a general theoretical framework that enhances the simplification and transparency of the dosing decisions by novel methods. The goal of this paper is to illustrate how these theoretical criteria can be useful beyond the standard phase I problem, and to inspire further research along this line.

TABLES

### Table 1

Weights of toxicity types and grades in Lee et al. (2012)

01234
Neuropathy0.000.190.641.032.53
Low platelet count0.000.170.170.400.85

### Table 2

A dose-outcome distribution of toxicity burden score

lωlπl(1)πl(2)πl(3)πl(4)πl(5)
00.001.001.001.001.001.00
10.170.530.630.700.890.94
20.190.400.480.530.780.86
30.360.280.390.450.720.82
40.400.250.350.410.660.76
50.590.210.290.350.590.70
60.640.200.280.330.550.66
70.810.090.220.320.540.65
80.850.060.190.310.520.63
91.030.050.160.250.450.55
101.040.030.140.250.440.54
111.200.020.120.230.380.47
121.430.010.110.230.370.46
131.490.010.110.230.360.45
141.880.010.100.230.350.43
152.530.010.100.230.340.42
162.700.000.040.120.240.34
172.930.000.020.050.140.21
183.380.000.010.020.070.13

Eπ{Y(k)}0.250.510.811.281.6

### Table 3

Escalation in tandem design for phase I/II trial where x2i = x2i−1 + 1 for all i

(Y2i−1, Y2i)RuleRemarks
(0, 0)x2i+1 = x2i−1 + 1Escalate; stay if x2i−1 = K − 1
(0, 1)x2i+1 = x2i−1 + 1Escalate; stay if x2i−1 = K − 1
(0, 2)x2i+1 = x2i−1Coherence restriction
(1, 0)x2i+1 = x2i−1
(1, 1)x2i+1 = x2i−1
(1, 2)x2i+1 = x2i−1 − 1De-escalate; stay if x2i−1 = 1
(2, 0)x2i+1 = x2i−1Coherence restriction
(2, 1)x2i+1 = x2i−1
(2, 2)x2i+1 = x2i−1 − 1De-escalate; stay if x2i−1 = 1

### Table 4

Comparison of the tandem design, the Thall-Cook method (PO and CR), and the optimality bounds. Relative efficiency is calculated as the ratio of the accuracy index of a method to that of the benchmark

MethodPercent of Selecting DoseRelative efficiency

12345None
Scenario 3 in Thall and Cook (2004)(FR, FT )(.20, .02)(.40, .03)(.60, .04)(.68, .06)(.74, .20)
δ−0.48−0.130.220.32−0.26
Tandem0.10.212.676.19.12.0.91.93
PO0.00.419.871.66.51.7.92.95
CR0.01.632.249.415.71.0.83.86
dET (π*)0.00.013.084.81.20.0.97-

Scenario 4 in Thall and Cook (2004)(FR, FT )(.52, .01)(.62, .015)(.71, .02)(.79, .025)(.86, .03)
δ0.120.290.450.580.69
Tandem1.30.42.528.867.00.0.93.94
PO0.11.710.134.353.70.0.90.91
CR0.00.11.14.694.00.1.99.99
dET (π*)0.00.00.44.595.10.0.99-

References
1. Anbar, D (1984). Stochastic approximation methods and their use in bioassay and phase I clinical trials. Communications in Statistics - Theory and Methods. 13, 2451-2467.
2. Azriel, D, Mandel, M, and Rinott, Y (2011). The treatment versus experimentation dilemma in dose finding studies. Journal of Statistical Planning and Inference. 141, 2759-2768.
3. Bekele, BN, and Thall, PF (2004). Dose-finding based on multiple toxicities in a soft tissue sarcoma trial. Journal of the American Statistical Association. 99, 26-35.
4. Braun, TM (2002). The bivariate continual reassessment method: Extending the CRM to phase I trials of two competing outcomes. Controlled Clinical Trials. 23, 240-256.
5. Cheung, YK (2002). On the use of nonparametric curves in phase I trials with low toxicity tolerance. Biometrics. 58, 237-240.
6. Cheung, YK (2005). Coherence principles in dose-finding studies. Biometrika. 92, 863-873.
7. Cheung, YK (2007). Sequential implementation of stepwise procedures for identifying the maximum tolerated dose. Journal of the American Statistical Association. 102, 1448-1461.
8. Cheung, YK (2010). Stochastic approximation and modern model-based designs for dose-finding clinical trials. Statistical Science. 25, 191-201.
9. Cheung, YK (2011). Dose Finding by the Continual Reassessment Method. Boca Raton, FL: CRC Press/Taylor & Francis Group.
10. Cheung, YK (2013). Sample size formulae for the Bayesian continual reassessment method. Clinical Trials. 10, 852-861.
11. Cheung, YK (2014). Simple benchmark for complex dose finding studies. Biometrics. 70, 389-397.
12. Cheung, YK (2015). Dose Finding in Tandem to Identify the Maximum Response.
13. Cheung, YK, Chakraborty, B, and Davidson, KW (2015). Sequential multiple assignment randomized trial (SMART) with adaptive randomization for quality improvement in depression treatment program. Biometrics. 71, 450-459.
14. Cheung, YK, and Chappell, R (2000). Sequential designs for phase I clinical trials with late-onset toxicities. Biometrics. 56, 1177-1182.
15. Cheung, YK, and Chappell, R (2002). A simple technique to evaluate model sensitivity in the continual reassessment method. Biometrics. 58, 671-674.
16. Cheung, YK, and Elkind, MS (2010). Stochastic approximation with virtual observations for dose-finding on discrete levels. Biometrika. 97, 109-121.
17. Durham, SD, Flournoy, N, and Rosenberger, WF (1997). A random walk rule for phase I clinical trials. Biometrics. 53, 745-760.
18. Finney, DJ (1978). Statistical Methods in Biological Assay. London: Griffin.
19. Gasparini, M, and Eisele, J (2000). A curve-free method for phase I clinical trials. Biometrics. 56, 609-615.
20. Haines, LM, Perevozskaya, I, and Rosenberger, WF (2003). Bayesian optimal designs for phase I clinical trials. Biometrics. 59, 591-600.
21. Houede, N, Thall, PF, Nguyen, H, Paoletti, X, and Kramar, A (2010). Utility-based optimization of combination therapy using ordinal toxicity and efficacy in phase I/II trials. Biometrics. 66, 532-540.
22. Jia, X, Lee, SM, and Cheung, YK (2014). Characterization of the likelihood continual reassessment method. Biometrika. 101, 599-612.
23. Lee, SM, Cheng, B, and Cheung, YK (2011). Continual reassessment method with multiple toxicity constraints. Biostatistics. 12, 386-398.
24. Lee, SM, and Cheung, YK (2009). Model calibration in the continual reassessment method. Clinical Trials. 6, 227-238.
25. Lee, SM, and Cheung, YK (2011). Calibration of prior variance in the Bayesian continual reassessment method. Statistics in Medicine. 30, 2081-2089.
26. Lee, SM, Hershman, DL, Martin, P, Leonard, JP, and Cheung, YK (2012). Toxicity burden score: A novel approach to summarize multiple toxic effects. Annals of Oncology. 23, 537-541.
27. Leung, DHY, and Wang, YG (2001). Isotonic designs for phase I trials. Controlled Clinical Trials. 22, 126-138.
28. Lin, Y, and Shih, WJ (2001). Statistical properties of the traditional algorithm-based designs for phase I cancer clinical trials. Biostatistics. 2, 203-215.
29. O’Quigley, J, and Chevret, S (1991). Methods for dose finding studies in cancer clinical trials: A review and results of a Monte Carlo study. Statistics in Medicine. 10, 1647-1664.
30. O’Quigley, J, Hughes, MD, and Fenton, T (2001). Dose-finding designs for HIV studies. Biometrics. 57, 1018-1029.
31. O’Quigley, J, Paoletti, X, and Maccario, J (2002). Non-parametric optimal design in dose finding studies. Biostatistics. 3, 51-56.
32. O’Quigley, J, Pepe, M, and Fisher, L (1990). Continual reassessment method: A practical design for phase 1 clinical trials in cancer. Biometrics. 46, 33-48.
33. O’Quigley, J, and Shen, LZ (1996). Continual reassessment method: A likelihood approach. Biometrics. 52, 673-684.
34. Polley, MYC (2011). Practical modifications to the time-to-event continual reassessment method for phase I cancer trials with fast patient accrual and late-onset toxicities. Statistics in Medicine. 30, 2130-2143.
35. Robbins, H, and Monro, S (1951). A stochastic approximation method. Annals of Mathematical Statistics. 22, 400-407.
36. Shen, LZ, and O’Quigley, J (1996). Consistency of the continual reassessment method under model misspecification. Biometrika. 83, 395-405.
37. Storer, BE (1989). Design and analysis of phase I clinical trials. Biometrics. 45, 925-937.
38. Storer, BE, and DeMets, D (1987). Current phase I/II designs: Are they adequate?. Journal of Clinical Research and Drug Development. 1, 121-130.
39. Thall, PF, and Cook, JD (2004). Dose-finding based on efficacy-toxicity trade-offs. Biometrics. 60, 684-693.
40. Wu, CJ (1985). Efficient sequential designs with binary data. Journal of the American Statistical Association. 80, 974-984.
41. Yuan, Z, Chappell, R, and Bailey, H (2007). The continual reassessment method for multiple toxicity grades: A Bayesian quasi?likelihood approach. Biometrics. 63, 173-179.
42. Zacks, S, Rogatko, A, and Babb, -J (1998). Optimal Bayesian-feasible dose escalation for cancer phase I trials. Statistics & Probability Letters. 38, 215-220.