In this article, we review the statistical methods and theory for dose finding in early phase clinical trials, where the primary objective is to identify an acceptable dose for further clinical investigation. The dose finding literature is initially motivated by applications in phase I clinical trials, in which dose finding is often formulated as a percentile estimation problem. We will present some important phase I methods and give an update on new theoretical developments since a recent review by >Cheung (2010), with an aim to cover a broader class of dose finding problems and to illustrate how the general dose finding theory may be applied to evaluate and improve a method. Specifically, we will illustrate theoretical techniques with some numerical results in the context of a phase I/II study that uses trinary toxicity/efficacy outcomes as basis of dose finding.
Phase I and phase I/II clinical trials of a new drug are typically small dose finding studies, whose goal is to identify an acceptable dose, defined with respect to some pre-specified toxicity and efficacy criteria, for further clinical investigation. Because little is known about the new drug in the early phase investigation, these studies are conducted in an adaptive and
Storer and DeMets (1987) give one of the earliest discussions on the topic of dose finding in the context of standard phase I trials, where the primary objective is to find the maximum tolerated dose based on
where
As the dose finding objective in clinical trials becomes increasingly sophisticated, novel adaptive designs have been proposed to address estimation beyond the percentile estimation objective (
The main goal of this article is to provide an overview of dose finding methods in clinical trials and an up-to-date review of the theoretical study. We will first focus on the standard phase I dose finding problem as defined in (
Let
Storer (1989) describes a number of up-and-down escalation designs. For example, according to his Design B, a trial starts at a low and safe dose (e.g.,
where the starting dose
Durham
where
Because the escalation rules of (
The continual reassessment method (CRM; O’Quigley
where
where
The CRM combines the design and estimation components of a dose finding study by setting
Most model-based methods are Bayesian and assume a prior distribution on the dose-toxicity curve. At the start of a trial when there is few or no data, these Bayesian methods will rely on the prior belief about
Stochastic approximation is perhaps the earliest method proposed for the purpose of dose finding. Anbar (1984) consider using the Robbins-Monro method (1951) in phase I trials. Precisely, the stochastic approximation determines the next dose by
for some constant
where
where
Second, the choice of
Coherence is a dose finding principle that stipulates that no dose escalation (de-escalation) should take place for the next patient if the current subject experiences a toxic (non-toxic) outcome (Cheung, 2005). Mathematically, a dose finding design is coherent in escalation if with probability one
for all
for all
Coherence is an ethical consideration for the design component of a dose finding method, and as a result, it may not correspond to efficient estimation of
Specifically, a hybrid design is not coherent if the initial sequence {
While there are many phase I dose finding methods (cf. Section 2), it is common to compare their performance
Note that in a dose finding study, only
Importantly, the properties of
for
An estimator
As a middle ground, Cheung (2010) discuss the concept of
For brevity of discussion, we confine our attention to the class of dose-toxicity curves such that
This section extends the notation in Section 2 to cover the general dose finding settings. We consider multinomial outcome
We shall also use
(Dose finding with binary toxicity outcomes). The outcome
(Dose finding with total toxicity burden). In most clinical situations, a patient may experience more than one type of toxicity at different severity grades. Instead of summarizing this toxicity profile into a binary toxicity outcome, Bekele and Thall (2004) propose to measure the total toxicity burden of a patient using a weighted sum of grades and types of toxicities: each toxicity type and grade will have a pre-determined weight, and the total burden
(Dose finding with multiple toxicity constraints). For the same clinical setting as in Example 2, Lee
(Dose finding with bivariate binary outcomes). In a phase I/II trial, dose finding is based on both efficacy and toxicity outcomes. Generally, the outcome
As illustrated in the previous subsection, the estimation component of a dose finding study is highly study-specific. Examples 2 and 3, both dealing with situations with multiple toxicities, take very different objective functions and hence lead to different target doses; recall that
In contrast, the design component has predominantly taken the form of a model-based or hybrid strategy: start a trial at a safe dose, treat and observe a few patients, obtain some model-based estimate of
where we set
In this section, we illustrate how the theoretical criteria can be applied to build and evaluate dose finding methods in the context of a phase I/II trial where each patient will experience one of three possible outcomes: no toxicity and no response (
where (
Cheung (2015) introduces the concept of a tandem dose finding design, in which patients are enrolled in pairs and treated at adjacent doses, and dose escalation between pairs are conducted in tandem. Without loss of generality, the design points will be subject to the following constraints:
for patient 2
following the outcome
because
It is typical to use simulation to compare different dose finding methods. Table 4 gives the simulation results of the tandem design, the Thall-Cook methods with the PO and the CR models, along with the benchmark
where
If we focus on the relative performance, we cannot tell apart the PO and the CR models because the former performs better than the latter under Scenario 3, and the converse holds under Scenario 4. This is possible because it is unlikely that a method or a model will perform
In this article, we review the theoretical concepts in dose finding with some details on coherence and optimality bounds. While these concepts were first developed for the standard phase I problem, we discuss their extension to general dose finding settings. Since we aim to cover theoretical developments after 2010, we have not covered all theoretical criteria in dose finding such as rigidity and unbiasedness, for which we refer readers to Cheung (2010).
In this article, we also give some important examples of “non-standard” dose finding problems including dose finding with trinary outcomes and multiple toxicity constraints. We, however, have not covered many important topics such as the use of time-to-event outcomes (Cheung and Chappell, 2000; Polley, 2011), testing drug combinations (Houede
Statistical methodology for dose finding trials is an application-oriented discipline. As a result, little attention has been given to studying the general theory of dose finding methods. This, however, falls short of the rigor necessitated by the ethics and scientific merits of a clinical study, and may miss the opportunity for developing powerful
Weights of toxicity types and grades in Lee
Toxicity grade | |||||
---|---|---|---|---|---|
0 | 1 | 2 | 3 | 4 | |
Neuropathy | 0.00 | 0.19 | 0.64 | 1.03 | 2.53 |
Low platelet count | 0.00 | 0.17 | 0.17 | 0.40 | 0.85 |
A dose-outcome distribution of toxicity burden score
0 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
1 | 0.17 | 0.53 | 0.63 | 0.70 | 0.89 | 0.94 |
2 | 0.19 | 0.40 | 0.48 | 0.53 | 0.78 | 0.86 |
3 | 0.36 | 0.28 | 0.39 | 0.45 | 0.72 | 0.82 |
4 | 0.40 | 0.25 | 0.35 | 0.41 | 0.66 | 0.76 |
5 | 0.59 | 0.21 | 0.29 | 0.35 | 0.59 | 0.70 |
6 | 0.64 | 0.20 | 0.28 | 0.33 | 0.55 | 0.66 |
7 | 0.81 | 0.09 | 0.22 | 0.32 | 0.54 | 0.65 |
8 | 0.85 | 0.06 | 0.19 | 0.31 | 0.52 | 0.63 |
9 | 1.03 | 0.05 | 0.16 | 0.25 | 0.45 | 0.55 |
10 | 1.04 | 0.03 | 0.14 | 0.25 | 0.44 | 0.54 |
11 | 1.20 | 0.02 | 0.12 | 0.23 | 0.38 | 0.47 |
12 | 1.43 | 0.01 | 0.11 | 0.23 | 0.37 | 0.46 |
13 | 1.49 | 0.01 | 0.11 | 0.23 | 0.36 | 0.45 |
14 | 1.88 | 0.01 | 0.10 | 0.23 | 0.35 | 0.43 |
15 | 2.53 | 0.01 | 0.10 | 0.23 | 0.34 | 0.42 |
16 | 2.70 | 0.00 | 0.04 | 0.12 | 0.24 | 0.34 |
17 | 2.93 | 0.00 | 0.02 | 0.05 | 0.14 | 0.21 |
18 | 3.38 | 0.00 | 0.01 | 0.02 | 0.07 | 0.13 |
0.25 | 0.51 | 0.81 | 1.28 | 1.6 |
Escalation in tandem design for phase I/II trial where
( | Rule | Remarks |
---|---|---|
(0, 0) | Escalate; stay if | |
(0, 1) | Escalate; stay if | |
(0, 2) | Coherence restriction | |
(1, 0) | ||
(1, 1) | ||
(1, 2) | De-escalate; stay if | |
(2, 0) | Coherence restriction | |
(2, 1) | ||
(2, 2) | De-escalate; stay if |
Comparison of the tandem design, the Thall-Cook method (PO and CR), and the optimality bounds. Relative efficiency is calculated as the ratio of the accuracy index of a method to that of the benchmark
Method | Percent of Selecting Dose | Relative efficiency | |||||||
---|---|---|---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | None | ||||
Scenario 3 in Thall and Cook (2004) | ( | (.20, .02) | (.40, .03) | (.60, .04) | (.68, .06) | (.74, .20) | |||
−0.48 | −0.13 | 0.22 | 0.32 | −0.26 | |||||
Tandem | 0.1 | 0.2 | 12.6 | 76.1 | 9.1 | 2.0 | .91 | .93 | |
PO | 0.0 | 0.4 | 19.8 | 71.6 | 6.5 | 1.7 | .92 | .95 | |
CR | 0.0 | 1.6 | 32.2 | 49.4 | 15.7 | 1.0 | .83 | .86 | |
0.0 | 0.0 | 13.0 | 84.8 | 1.2 | 0.0 | .97 | - | ||
Scenario 4 in Thall and Cook (2004) | ( | (.52, .01) | (.62, .015) | (.71, .02) | (.79, .025) | (.86, .03) | |||
0.12 | 0.29 | 0.45 | 0.58 | 0.69 | |||||
Tandem | 1.3 | 0.4 | 2.5 | 28.8 | 67.0 | 0.0 | .93 | .94 | |
PO | 0.1 | 1.7 | 10.1 | 34.3 | 53.7 | 0.0 | .90 | .91 | |
CR | 0.0 | 0.1 | 1.1 | 4.6 | 94.0 | 0.1 | .99 | .99 | |
0.0 | 0.0 | 0.4 | 4.5 | 95.1 | 0.0 | .99 | - |