TEXT SIZE

CrossRef (0)
Sample size calculation for comparing time-averaged responses in K-group repeated binary outcomes

Jijia Wanga, Song Zhangb, and Chul Ahn1,b

aDepartment of Statistical Science, Southern Methodist University, USA, bDepartment of Clinical Sciences, UT Southwestern Medical Center, USA
Correspondence to: Department of Clinical Sciences, UT Southwestern Medical Center, 5323 Harry Hines Blvd, E5.506, Dallas, TX 75390, USA. E-mail: Chul.Ahn@UTSouthwestern.edu
Received March 29, 2018; Revised April 16, 2018; Accepted April 16, 2018.
Abstract

In clinical trials with repeated measurements, the time-averaged difference (TAD) may provide a more powerful evaluation of treatment efficacy than the rate of changes over time when the treatment effect has rapid onset and repeated measurements continue across an extended period after a maximum effect is achieved (Overall and Doyle, Controlled Clinical Trials, 15, 100–123, 1994). The sample size formula has been investigated by many researchers for the evaluation of TAD in two treatment groups. For the evaluation of TAD in multi-arm trials, Zhang and Ahn (Computational Statistics & Data Analysis, 58, 283–291, 2013) and Lou et al. (Communications in Statistics-Theory and Methods, 46, 11204–11213, 2017b) developed the sample size formulas for continuous outcomes and count outcomes, respectively. In this paper, we derive a sample size formula to evaluate the TAD of the repeated binary outcomes in multi-arm trials using the generalized estimating equation approach. This proposed sample size formula accounts for various correlation structures and missing patterns (including a mixture of independent missing and monotone missing patterns) that are frequently encountered by practitioners in clinical trials. We conduct simulation studies to assess the performance of the proposed sample size formula under a wide range of design parameters. The results show that the empirical powers and the empirical Type I errors are close to nominal levels. We illustrate our proposed method using a clinical trial example.

Keywords : time-averaged difference, multi-arm trials, sample size formula
1. Introduction

In clinical trials with repeated measurements, comparing treatments based on the time-averaged difference (TAD), defined as the difference in the average of longitudinally measured responses between treatment groups, is often considered a meaningful metric for the treatment effect. Overall and Doyle (1994) suggested that TAD can provide a more powerful evaluation of treatment efficacy than the rate of changes over time when the treatment effect has rapid onset and repeated measurements are obtained across an extended period after the maximum effect has been achieved. Many sample size formulas have been developed for the inference of TAD between two treatment groups (Overall and Doyle, 1994; Diggle et al., 2013; Zhang and Ahn, 2012; Lou et al., 2017a). However, sample size calculation for the comparison of time-averaged responses among multiple treatment groups has received less attention in the literature. Randomized trials with multiple treatment arms are widely used in practice (Parmar et al., 2014). They increase the chance of finding an effective treatment as well as reduce the cost and time requirement of clinical trials by testing more treatments simultaneously. For multi-arm trials with continuous outcomes, Zhang and Ahn (2013) presented the sample size formula to compare the time-averaged responses based on the generalized estimating equation (GEE) method (Liang and Zeger, 1986). This sample size formula took into account arbitrary missing patterns, various correlation structures, and unbalanced randomization. Lou et al. (2017b) further extended the sample size approach to multi-arm trials with repeatedly measured count outcomes.

In this paper, we investigate sample size calculation for the comparison of time-averaged responses among K ≥ 3 groups where a binary outcome is repeatedly measured over the study period. The paper is arranged as follows. In Section 2, we briefly review the GEE method for the inference of TAD in multi-arm trials with a repeatedly measured binary outcome. In Section 3, we derive a closed-form sample size formula for the assessment of TAD among treatment groups over time using the GEE method. We demonstrate that this formula is flexible to account for various missing patterns and correlation structures. In Section 4, we conduct simulation studies to assess the performance of the proposed sample size formula under various practical settings. In Section 5, we illustrate the proposed method using a clinical trial example. Section 6 concludes with the discussion.

2. Generalized estimating equation estimator

Suppose in a clinical trial a total of n subjects are enrolled and randomly assigned to one of K treatment groups. Each subject is scheduled to obtain J measurements over the study period. Let Yki j be the binary response measurement obtained at time tj( j = 1, …, J) from subject i(i = 1, …, nk) of the kth treatment group (k = 1, …, K), where nk denotes the number of subjects assigned to the kth treatment group. We use rk = nk/n to denote the proportion of subjects assigned to the kth treatment group. To make inference about the TAD among the K treatment groups, we model Yki j with the following logistic model:

$Ykij ~Bernoulli(pk),logit(pk) =log (pk1-pk)=bk, for k=1,…,K.$

We have E(Yki j) = pk = ebk/(1 + ebk). That is, b = (b1, …, bK)′ represents the time-averaged response on the log-odds scale for the K treatment groups. As a binary variable, it is obvious that Var(Yki j) = pk(1− pk). Furthermore, we model the within-subject correlation among measurements obtained from the same subject by corr(Yki j, Yki j) = ρj j with ρj j = 1. We assume Yki j’s to be independent across subjects.

The GEE estimator of b, denoted by = (1, …,K)′, can be obtained from the following equation:

$b^k=log (∑i=1nk∑j=1Jykij/(nkJ)1-∑i=1nk∑j=1Jykij/(nkJ)),$

which is derived based on an independent working correlation structure. Liang and Zeger (1986) showed that as n → ∞, $n(b^-b)$ approximately has a normal distribution with mean vector 0 and variance matrix V. We can consistently estimate V by $Vn=WAn-1(b^)ΣnAn-1(b^)W$, where W is a diagonal matrix with diagonal elements being ($1/r1,…,1/rK$),

$An(b)=(1n1∑i=1n1∑j=1Jeb1(1+eb1)20⋯001n2∑i=1n2∑j=1Jeb2(1+eb2)2⋯0⋮⋮⋱⋮0001nK∑i=1nK∑j=1JebK(1+ebK)2),$

and

$Σn=(1n1∑i=1n1(∑j=1Jɛ^1ij)20⋯001n2∑i=1n2(∑j=1Jɛ^2ij)2⋯0⋮⋮⋱⋮0001nK∑i=1nK(∑j=1Jɛ^Kij)2).$

Here ε̂ki j = yki jek/(1 − ek) denotes the residual.

To compare the time-averaged responses among K treatment groups, the null hypotheses of interest is H0 : b1 = · · · = bK. The test statistic is

$Z=C′b^Var (C′b^).$

where C = (c1, …, cK)′ is a vector denoting a contrast of treatment effects with $∑k=1Kck=0$. For example, one reasonable specification would be C = (−1, 1/(K − 1), …, 1/(K − 1))′. The null hypothesis is rejected if |Z| > z1−α/2, where z1−α/2 is the 100(1−α/2)th percentile of the standard normal distribution.

3. Sample size calculation

As n → ∞, let A and denote the limits of the An and n, respectively. Then Vn converges to V = WA−1()A−1()W. Given true treatment effects θ = (θ1, …, θK), the required sample size to reject the null hypothesis with power 1 − γ at significance level α is

$n=(Z1-σ2+Z1-γ)2C′VC(C′θ)2.$

Missing data are frequently encountered in clinical trials with repeated measurements. We now show that a closed-form extension of (3.1) can be obtained to account for missing data. Let Δki j be the missing indicator, which takes value 0/1 for a missed/observed outcome measurement. Then the generalized expressions of An and n that accommodate missing data are

$An(b)=(1n1∑i=1n1∑j=1JΔ1ijeb1(1+eb1)20⋯001n2∑i=1n2∑j=1JΔ2ijeb2(1+eb2)2⋯0⋮⋮⋱⋮0001nK∑i=1nK∑j=1JΔKijebK(1+ebK)2),$

and

$Σn=(1n1∑i=1n1(∑j=1JΔ1ijɛ^1ij)20…001n2∑i=1n2(∑j=1JΔ2ijɛ^2ij)2…0⋮⋮⋱⋮0001nK∑i=1nK(∑j=1JΔKijɛ^Kij)2).$

We assume that the missing probabilities only depend on time. In addition, we define δj = Eki j) to be the probability of a subject having a measurement at time tj and δj j = Eki jΔki j) to be the probability of a subject having measurements simultaneously at tj and tj. Note that δj j = δj. We can use probabilities δj and δj j to describe various types of missing patterns. Then, as n → ∞, we have

$A (b)=∑j=1Jδj (eb1(1+eb1)20…00eb2(1+eb2)2…0⋮⋮⋱⋮000ebK(1+ebK)2)$

and

$Σ=∑j=1J∑j′=1Jδjj′ρjj′ (eb1(1+eb1)20…00eb2(1+eb2)2…0⋮⋮⋱⋮000ebK(1+ebK)2).$

The general variance V can be expressed as

$V=WA-1ΣA-1W=∑j=1J∑j′=1Jδjj′ρjj′(∑j=1Jδj)2 ((1+eb1)2r1eb10…00(1+eb2)2r2eb2…0⋮⋮⋱⋮000(1+ebK)2rKebK).$

Plugging V and C = (−1, 1/(K − 1), …, 1(K − 1))′ into Equation (3.1), the generalized sample size formula that accommodates various correlation structures and missing data patterns is

$n=(z1-σ2+z1-γ)2 ∑j=1J∑j′=1Jδjj′ρjj′(θ1-1K-1∑k=2Kθk)2(∑j=1Jδj)2 [(1+eb1)2r1eb1+(1K-1)2∑k=2K(1+ebk)2rkebk].$
4. Simulation studies

To assess the performance of the proposed sample size method, we conduct simulation under different parameter settings. Suppose subjects are randomly assigned to one of four treatment groups (K = 4). Each subject is planned to obtain T = 6 measurements at scheduled times (tj = j − 1 for j = 1, …, 6). We investigate three missing patterns: independent missing (IM), monotone missing (MM), and mixed missing (MIX). Under IM, missing measurements occur independently over time with δj j = δjδj for jj′. Under MM, a subject missing a measurement at time tj will miss all the following measurements, such that δj j = δj for jj′. IM represents scenarios where subjects miss clinical visits due to random reasons. MM represents scenarios where subjects randomly drop out during the study period. In real clinical trials, it is likely that subject dropout and missing visits both occur, which implies a mixture of IM and MM, denoted by MIX. Let ($δ1(IM),…,δJ(IM)$) and ($δ1(MM),…,δJ(MM)$) be the marginal probabilities of obtaining a measurement under the IM and MM patterns, respectively. Moreover, let $δjj′(IM)$ and $δjj′(MM)$ be the corresponding joint probabilities under these two patterns. Suppose the proportion of subjects who follow IM and MM patterns are w and 1 − w, then for mixed missing, we have

$δj(MIX)=wδj(IM)+(1-w) δj(MM),δjj′(MIX)=wδjj′(IM)+(1-w) δjj′(MM).$

In simulation studies, we set w = 0.5 for the MIX pattern. For marginal observation probabilities, we assume δ(IM) = δ(MM) = δ = (δ1, …, δJ), where four sets of values are explored:

$δ1=(1.00,1.00,1.00,1.00,1.00,1.00),δ2=(1.00,0.95,0.90,0.85,0.80,0.75),δ3=(1.00,0.99,0.96,0.91,0.84,0.75),δ4=(1.00,0.91,0.84,0.79,0.76,0.75).$

These settings correspond to four scenarios. We assume no missing data at t1. δ1 indicates complete data throughout the study. δ2 indicates that the missing probabilities have a linear trend with 5% increase in dropout rate at each subsequent time point. δ3 indicates an accelerating trend missing probabilities toward the end of study. δ4 indicates a trend opposite to that of δ3. Note that δ2δ4 have the same dropout rate (25%) at the end of study.

The simulation study also explores two correlation structures: compound symmetry (CS, ρj j = ρ for jj′) and auto-regressive (AR(1), ρj j = ρ|tjtj|) with ρ = 0.3, 0.5. We set the Type I error and power at α = 0.05 and 1 − γ = 0.8, respectively. For simplicity, we assume the balanced design with r1 = · · · = rK = 1/K. However, the proposed sample size formula is applicable to any unbalance design. We consider two alternative hypotheses. The first alternative hypothesis is that the first group is control group. The others are treatment groups with same treatment effect. Specifically, H1 : θ1 = 0, θ2 = θ3 = θ4 = 0.5. The second alternative hypothesis is that the treatment groups are ordered by treatment effects. Specifically, H1 : θ1 = 0, θk = θ1 +(k − 1) Δ0 for k = 2, 3, 4 and Δ0 = 0.25. For each combination of design parameters (missing pattern, marginal observation probability δ, correlation structure, correlation ρ, alternative hypothesis H1), the simulation is conducted as:

• Calculate the required sample size (n) based on Equation (3.2).

• Generate n samples under null hypothesis and alternative hypothesis separately. Each sample contains n multivariate Bernoulli random variable Yki = (Yki1, …, YkiJ)′ with correlation parameter ρj j under the assumed correlation structure. The correlated binary vectors are generated by the method of Emrich and Piedmonte (1991).

• Create incomplete data sets. Generate missing indicators based on the specified missing pattern and the marginal observation probabilities.

• Calculate , An, n, and W, and the test statistic Z based on Equation (2.1).

• Repeat Steps 2–4 for L = 10,000 times. The empirical Type I error and empirical power are estimated by $∑l=1LI (|Z|>z1-α/2)/L$ under the null hypothesis and alternative hypothesis, respectively.

Tables 1 and 2 summarize the required sample sizes and their corresponding empirical Type I errors and empirical powers under different combinations of simulation parameters. Table 1 is obtained under alternative hypotheses H1 : θ1 = 0, θ2 = θ3 = θ4 = 0.5, and Table 2 under H1 : θ1 = 0, θ2 = 0.25, θ3 = 0.5, θ4 = 0.75. We have several observations: (1) The empirical powers and Type I errors are close to the nominal levels, which are 0.8 and 0.05, respectively. Therefore, the proposed sample size formula has a good performance over a wide range of design parameter settings; (2) The sample sizes under the AR(1) are always smaller than those under the CS correlation structure, with other design parameters being the same; (3) The sample size increases with the correlation (ρ), which is obvious from the sample size formula (3.2); (4) Given the same marginal observation probabilities, the order of the required sample sizes under different missing pattern is MM > MIX > IM. Under MM missing data tend to concentrate in successive observations from a few subjects, while under IM missing data are randomly distributed across subjects over the study period. The MM missing pattern leads to greater information loss (hence greater sample size requirement) than IM, with MIX lies in between. (5) Despite the same dropout rate at the end of study, the order of required sample sizes is δ4 > δ2 > δ3, because the overall proportion of missing values is the greatest under δ4.

5. Example

PASS sample size software manual (Pass14, 2015) illustrated the sample size estimation to test proportions in a repeated measurement design between two treatment groups. Here, we show sample size estimation to test proportions among three treatment groups. An investigator wants to design a study that compares the efficacy of a prophylactic treatment for the common cold with two active drugs and a placebo. The null hypothesis is that there is no difference in the proportion of patients who get sick among three treatment groups. Patients will be randomly assigned to one of three treatment groups with an equal probability, and followed monthly from September to April (beginning in October, hence J = 7) to determine the patient’s disease status (present or absent). The study investigated if there is an overall difference in the proportion of patients who get sick among three treatment groups. A baseline of 60% disease rate for the common cold is estimated based on previous studies. It is expected that the disease rate will continue to be 60% in the placebo group. Suppose that a clinically meaningful difference in efficacy is 30% relative reduction in disease rate in two active medication groups, which corresponds to a disease rate of 42%. The hypothesis of interest is then H0 : b1 = b2 = b3 versus H1 : b1 = logit(60%) = 0.4055, b2 = b3 = logit(42%) = −0.3228. We would like to calculate the sample size that can detect the difference in treatment effect with Type I error α = 0.05 and power 1 − γ = 0.8 under a balanced design. We set the measurement times at tj = j−1( j = 1, …, 7). The observation probabilities are assumed to follow a linear trend with a 30% dropout at the end of study, δ = (1, 0.95, 0.90, 0.85, 0.80, 0.75, 0.70). Under the AR(1) correlation structure with ρ = 0.5, the sample sizes required under the IM, MM, and MIX (assuming a balanced mixture of IM and MM) patterns are 104, 110, and 107, respectively. However, the required sample sizes under the IM, MM, and MIX are 165, 175, and 170 under the CS correlation structure.

6. Discussion

In this study, we derived a sample size formula to compare the time-averaged responses of repeated binary outcomes among K groups. This proposed sample size formula can accommodate arbitrary correlation structures, missing patterns, marginal observation probabilities, and unbalanced experimental designs. We develop a sample size formula based on the GEE method because: (1) It has been widely used to analyze data from clinical trials with longitudinal/repeated observations; (2) It is robust to the misspecification of correlation structure (Liang and Zeger, 1986; Jung and Ahn, 2003); (3) It is flexible to accommodate missing data (Zeger et al., 1988). Our simulation studies show that the empirical powers and the empirical Type I errors are very close to the nominal levels under a wide range of design parameters. When we compare the time-averaged responses among K groups, a larger correlation is always associated with a larger sample size, which is obvious in Equation (3.2) and discussed in Lou et al. (2017a).

TABLES

### Table 1

Required sample size (empirical power, empirical Type I error) under H1 : θ1 = 0, θ2 = θ3 = θ4 = 0.5

δCSAR(1)

ρ = 0.3ρ = 0.5ρ = 0.3ρ = 0.5
δ1284 (0.8074, 0.0508)397 (0.7996, 0.0546)188 (0.8083, 0.0550)266 (0.8032, 0.0544)

IMδ2300 (0.8060, 0.0546)413 (0.8042, 0.0507)205 (0.8047, 0.0547)283 (0.8035, 0.0519)
δ3295 (0.8043, 0.0522)408 (0.8065, 0.0511)201 (0.8098, 0.0573)280 (0.7983, 0.0524)
δ4305 (0.8096, 0.0526)418 (0.8032, 0.0481)209 (0.8056, 0.0555)286 (0.8055, 0.0532)

MMδ2312 (0.8079, 0.0555)433 (0.8068, 0.0499)212 (0.8081, 0.0524)297 (0.8037, 0.0526)
δ3301 (0.8082, 0.0521)417 (0.8041, 0.0517)205 (0.8038, 0.0518)287 (0.8060, 0.0520)
δ4323 (0.8049, 0.0488)449 (0.8076, 0.0542)219 (0.8095, 0.0562)307 (0.8021, 0.0486)

MIXδ2306 (0.8099, 0.0547)423 (0.8035, 0.0551)208 (0.8053, 0.0515)290 (0.8043, 0.0503)
δ3298 (0.8070, 0.0481)413 (0.7985, 0.0505)203 (0.8037, 0.0561)283 (0.8060, 0.0535)
δ4314 (0.8079, 0.0563)433 (0.8014, 0.0534)214 (0.8076, 0.0563)297 (0.8094, 0.0506)

CS = compound symmetry; AR = auto-regressive; IM = independent missing; MM = monotone missing; MIX = mixed missing.

### Table 2

Required sample size (empirical power, empirical Type I error) under H1 : θ1 = 0, θ2 = 0.25, θ3 = 0.5, θ4 = 0.75

δCSAR(1)

ρ = 0.3ρ = 0.5ρ = 0.3ρ = 0.5
δ1285 (0.8005, 0.0516)399 (0.8054, 0.0501)189 (0.8045, 0.0536)267 (0.8050, 0.0507)

IMδ2301 (0.8000, 0.0529)414 (0.8067, 0.0504)205 (0.8026, 0.0567)284 (0.8087, 0.0504)
δ3296 (0.8032, 0.0500)410 (0.8088, 0.0522)201 (0.8024, 0.0536)281 (0.8090, 0.0549)
δ4306 (0.8020, 0.0537)419 (0.8010, 0.0497)209 (0.8078, 0.0561)287 (0.8046, 0.0518)

MMδ2312 (0.8027, 0.0523)434 (0.8055, 0.0530)212 (0.8059, 0.0575)297 (0.8002, 0.0504)
δ3301 (0.8094, 0.0489)419 (0.8091, 0.0551)205 (0.8087, 0.0526)288 (0.8020, 0.0518)
δ4324 (0.8077, 0.0527)450 (0.8008, 0.0500)220 (0.8075, 0.0531)308 (0.8033, 0.0545)

MIXδ2307 (0.8035, 0.0512)424 (0.8001, 0.0505)209 (0.8039, 0.0529)291 (0.8084, 0.0505)
δ3299 (0.8079, 0.0495)414 (0.8089, 0.0503)203 (0.8058, 0.0551)284 (0.8010, 0.0559)
δ4315 (0.8083, 0.0522)435 (0.8074, 0.0536)215 (0.8085, 0.0525)297 (0.8077, 0.0506)

CS = compound symmetry; AR = auto-regressive; IM = independent missing; MM = monotone missing; MIX = mixed missing.

References
1. Diggle, PJ, Heagerty, P, Liang, KY, and Zeger, SL (2013). Analysis of Longitudinal Data: Array
2. Emrich, L, and Piedmonte, M (1991). A method for generating high-dimensional multivariate binary variates. The American Statistician. 45, 302-304.
3. Jung, SH, and Ahn, C (2003). Sample size estimation for GEE method for comparing slopes in repeated measurements data. Statistics in Medicine. 22, 1305-1315.
4. Liang, KY, and Zeger, SL (1986). Longitudinal data analysis for discrete and continuous outcomes using Generalized Linear Models. Biometrika. 84, 3-32.
5. Lou, Y, Cao, J, Zhang, S, and Ahn, C (2017a). Sample size calculations for time-averaged difference of longitudinal binary outcomes. Communications in Statistics-Theory and Methods. 46, 344-353.
6. Lou, Y, Cao, J, and Ahn, C (2017b). Sample size estimation for comparing rates of change in K-group repeated count outcomes. Communications in Statistics-Theory and Methods. 46, 11204-11213.
7. Parmar, M, Carpenter, J, and Sydes, MR (2014). More multiarm randomised trials of superiority are needed. The Lancet. 384, 283-284.
8. PASS14 (2015). Power Analysis and Sample Size Software: NCSS LLC
9. Overall, J, and Doyle, S (1994). Estimating sample sizes for repeated measurement design. Controlled Clinical Trials. 15, 100-123.
10. Zhang, S, and Ahn, C (2012). Sample size calculations for the time-averaged differences in the presence of missing data. Contemporary Clinical Trials. 33, 550-556.
11. Zeger, SL, Liang, KY, and Albert, PS (1988). Models for longitudinal data: a generalized estimating equation approach. Biometrics. 44, 1049-1060.
12. Zhang, S, and Ahn, C (2013). Sample size calculation for comparing time-averaged responses in k-group repeated-measurement studies. Computational Statistics & Data Analysis. 58, 283-291.