TEXT SIZE

search for



CrossRef (0)
A comparison of testing methods in non-inferiority clinical trials
Communications for Statistical Applications and Methods 2024;31:613-625
Published online November 30, 2024
© 2024 Korean Statistical Society.

Jieun Parka, Jae Won Lee1,a

aDepartment of Statistics, Korea University, Korea
Correspondence to: 1 Department of Statistics, Korea University, 145 Anam-Ro, Sungbuk-Gu, Seoul 02841, Korea. E-mail: jael@korea.ac.kr
Received April 18, 2024; Revised September 20, 2024; Accepted October 2, 2024.
 Abstract
A general non-inferiority (NI) clinical trial is typically conducted using parametric testing methods with large samples. However, patient recruitment challenges often hinder rare disease trials, leading to enrollment failures. In this study, we introduce current parametric and nonparametric NI trial testing methods and propose modifications to enhance the performance of the nonparametric approach. Through a comprehensive simulation study with various sample sizes, data distributions, and sample ratios, we compare empirical levels and statistical powers as criteria for evaluating performance. Our findings indicate that the modified nonparametric methods outperformed the existing methods, particularly under conditions of small sample sizes and non-normal distributions, offering valuable insights for improving the reliability and sensitivity of NI trials in the context of rare diseases.
Keywords : non-inferiority clinical trial, nonparametric method, relative effect, rank-based, rare disease. pseudo rank
1. Introduction

Most non-inferiority (NI) clinical trial is conducted with large-scale enrollment. With this scale, the parametric testing method is used to confirm the non-inferiority in general. Hida and Tango (2011) introduced a parametric t-test to assess the NI trial. In this approach, the mean of each treatment group is used as the measure of the hypotheses, and the test statistic follows a Student’s t-distribution and degrees of freedom vary according to variance’s homogeneity. However, in the case of rare diseases, it is difficult to recruit patients who meet the eligibility criteria. In this case, a clinical trial is terminated due to insufficient enrollment, and research on the rare disease could not persist. Therefore, there have been some research on NI trials with nonparametric methods, which are less restrictive than parametric methods. The nonparametric method is not affected by the normality of data and the homogeneity of variance.

In situations where nonparametric methods are appropriate, enrollment is small scale or normality of data is not assured, there have been several methods. In this paper, we consider the three-arm design non-inferiority clinical trial. The three-arm design indicates that the clinical trial with experimental, reference drug, and placebo. The importance of three-arm design is highlighted in various research since we can show the assay sensitivity in the presence of a placebo arm. In the ICH E10 guideline (2000), assay sensitivity is defined as the property of a clinical trial defined as the ability to distinguish an effective treatment from a less effective or inefficient treatment. Thus, in many non-inferiority clinical trials, demonstrating assay sensitivity takes precedence over establishing non-inferiority. Park and Kim (2014) proposed a ratio-shape formulation for the non-inferiority hypotheses. They utilized a Hodges-Lehmann estimator to test this non-inferiority hypothesis. However, there is an alternative approach to assessing nonparametric NI trials known as the ‘relative effect’ which uses the relative effect as a measure of the hypothesis instead of the mean effect. Munzel (2009) introduced a ratio-shaped hypothesis with relative effect.

In this paper, we introduce a modification of the method by Park and Kim (2014) applicable in a three-arm NI trial, along with a modification of Munzel (2009) method utilizing unweighted relative effect—a measure calculated with pseudo rank, offering advantages over traditional weighted rank. While various NI trial testing methods have been proposed, comprehensive comparisons of their performance remain scarce. Our primary focus lies in introducing and applying these methods. To rigorously assess their performance and effectiveness, we conduct a comprehensive comparison with existing testing methods through a carefully designed simulation study. By evaluating empirical levels and statistical powers, we aim to provide valuable insights into the practical utility of these methods in non-inferiority clinical trials.

This paper is structured as follows. Section 2 introduces existing NI trial testing methods, encompassing both parametric and nonparametric approaches. In Section 3, we present the modification of the method by Park and Kim for three-arm clinical trials and propose the utilization of unweighted relative effect in Munzel’s method. Section 4 presents the simulation results, where we evaluate the performance of the testing methods. Finally, Section 5 concludes our results and initiates a discussion on the implications of our findings.

2. Existing testing methods

2.1. Parametric method

Hida and Tango (2010) suggested a parametric testing method when a NI trial includes a single experimental, reference treatment and placebo. Assume that the primary endpoint under the three-arm is Xi j, i = E, R, P, j = 1, . . . , ni, respectively. XEj , XRj and XPj are mutually independent and normally distributed with unknown but common variance σ2, that is, XEj N(μE, σ2), j = 1, . . . , nE, XRj N(μR, σ2), j = 1, . . . , nR and XPj N(μP, σ2), j = 1, . . . , nP. The total sample size is N = nE + nR + nP. Each sample size of treatment group is not necessarily identical. The non-inferiority trial and assay sensitivity null hypotheses are H0E: μEμR−M2 and A0: μRμPM1. H0E is the hypothesis of NI trial and A0 is the hypothesis of assay sensitivity. The corresponding alternative hypotheses are H1E: μEμR > −M2 and A1: μRμP > M1 respectively. M1 is the entire effect size of reference drug and M2 is the largest clinically acceptable difference, i.e., non-inferiority margin. It is required that M1M2 = r × M1, where 0 < r ≤ 1. Choosing the proper value of M1 and M2 is found in FDA guidance for non-inferiority trials (2016).

To test the null hypotheses H0E and A0, Student t-test is used. TE is the test statistic for NI trial

TE=X¯E-X¯R+M2σ^ER(1/nE)+(1/nR),

and TA is the test statistics for proving assay sensitivity.

TA=X¯R-X¯P-M1σ^RP(1/nR)+(1/nP),

where X¯i=(1/ni)Σj=1njXij, i = E, R, P, j = 1, . . . , ni. And the homogeneous variance σ̂ER and σ̂RP are defined σ^ER2=((nE-1)sE2+(nR-1)sR2)/nE+nR-2 and σ^RP2=((nR-1)sR2+(np-1)sP2)/nR+nP-2 where sE2, sR2 and sP2 denote a sample variance of experimental, reference and placebo treatments, respectively. The test statistic TE follows a t-distribution with degrees of freedom (nE + nR – 2) and TA follows a t-distribution with degrees of freedom (nR + nP – 2). And corresponding 100 × (1 – α)% confidence intervals are

[(X¯E-X¯R)-tα2(nE+nR-2)×σ^ER1nE+1nR,(X¯E-X¯R)-tα2(nE+nR-2)×σ^ER1nE+1nR],

and

[(X¯R-X¯P)-tα2(nR+nP-2)×σ^RP1nR+1nP,(X¯R-X¯P)-tα2(nR-nP-2)×σ^RP1nR+1nP]

respectively.

In case of heterogenous variance, the identical confidence interval is applied and the only difference is degrees of freedom. Detailed formulas are found in Huang et al. (2015).

2.2. Nonparametric methods

As mentioned in the introduction, we can divide nonparametric NI trial testing methods into two categories. The first category involves NI hypotheses testing with mean effect, while the other category focuses on NI hypothesis testing with relative effect. We first introduce the method by Park and Kim (2014). However, in our comparison scenario assuming the clinical trial with a three-arm design, Park-Kim method is not included because Park-Kim method is applied in case of two-arm clinical trial which contains only a single experimental drug and reference drug, not a placebo.

2.2.1. Park-Kim method

Park and Kim (2014) introduced a nonparametric NI trial based on the Wilcoxon rank-sum test and Hodges-Lehmann estimator of reference drug. Let XEi , i = 1, . . . , nE and XRj , j = 1, . . . , nR are the primary endpoints from the experimental group and reference group respectively. Then the null hypothesis of non-inferiority trial is H0E: (μEμR)/μRλ and the corresponding alternative hypothesis is H1E: (μEμR)/μR > λ where λ is M2 – 1 and M2 is a non-inferiority margin.

For all XEi and XRj (i = 1, . . . , nE, j = 1, . . . , nR), define Qi j as Qi j = XRjXEi and its order statistics as Q(1), Q(2), . . . , Q(nEnR). Then the rank-sum statistics are defined as median of Qi j. Therefore, upper and lower limit of 100 · (1–α)% confidence interval of Wilcoxon rank sum test is LLw = Q(Cα),ULw = Q(nEnR+1−Cα), respectively, where Cα = ((nE(2nR + nE + 1))/2) + 1 – wα and wα is upper 100 × αth quantile of Wilcoxon rank sum statistics WRjRj=(XRj+XRj)/2 when RjRj’, (Rj, Rj’ = 1, . . ., nR). Consequently, the lower and upper limit of nonparametric 100 × (1 – α)% confidence interval of (μEμR)/μR is

LLN=Q(cα)med(WRjRj)         RjRj,(Rj,Rj=1,,nR)ULN=Q(nEnR=1-Cα)med(WRjRj)         RjRj,(Rj,Rj=1,,nR)

respectively. In Section 3, we reformulate this method to be used in three-arm design.

2.2.2. Park-Kim method

Before we explain the Munzel method, the relative effect needs to be defined first. Let

Xij~Fi,         i=1,,k,   j=1,,ni,

where i denotes treatment group and j denotes the individual within the ith treatment. Fi(x) = P(Xi j < x)+(1/2)P(Xi j = x) = (1/2)[Fi + Fi+] denotes the average of the left and right continuous version of the distribution function, where Fi = P(X < x) is the left-continuous version and Fi+ = P(Xx) is the right-continuous version of the cumulative distribution function (cdf) of X. This statistical model does not include any parameter, and it could be used to describe the influence of the treatment to the observation. Thus, the marginal distribution function can be used to describe the relative effect

pi=HdFi=P(Xij<x)+12P(Xij=x),         i=1,,k,   j=1,,ni,

where H denotes a mean distribution of F. Additional explanation of relative effect is found in Brunner et al. (2018, 2021).

Munzel (2009) suggested a non-inferiority testing and assay sensitivity null hypothesis H0E: (pEpR)/pRpP−δ and A0: pRpPQ1 and corresponding alternative hypotheses are H1E: (pEpR)/pRpP > −δ and A1: pRpP > Q1. Where pE, pR and pP are relative effect of experimental, reference and placebo group respectively, and Q1 is entire relative effect size of R (same as M1 in parametric setting) and Q2 = δ(pRpP) is the margin of the test. Thus, δ plays role in parametric testing as ‘γ’.

Applying Fieller’s theorem, the two-sided (1 – α) × 100% confidence interval for ratio (pEpR)/pRpP is,

1(1-g)[P^E-P^RP^R-P^P-g·cov(p^E-p^R,p^R-p^P)var(p^R-p^P)±z1-α/2p^R-p^PC],

where

g=z1-α22·var(pR-pP)(p^R-p^P)2,

and

C=var(p^R-p^P)-2·p^E-p^Rp^R-p^P·cov(p^E-p^R,p^R-p^P)+(p^E-p^Rp^R-p^P)2·var(p^R-p^P)-g·(var(p^E-p^R)-(cov(p^E-P^R,p^R-p^P)2var(p^R-p^P)).

Also, define Rik as vector of overall rank of Xik among all N observations and Rik(i) as internal rank of Xik among all ni observations in the ith treatment group and Rik(-j) as partial rank of Xik among all Nnj observations except those in the jth treatment group, then

var (Npi)=1N[1ni(ni-1)Rit·Ri+1ni2r=13nrnr-1Rrit·Rri],         and         cov (Np^i,pi)=1N[1ninjr=13nrnr-1Rrit·Rrj-1nj(ni-1)Rit·Rij-1ni(nj-1)Rjt·Rji],

where Ri={Rik-Rik(i)-R¯i.+R¯i.(i)}k=1,,ni and Rij={Rik-Rik(-j)-R¯i.+R¯i.(-j)}k=1,,ni for ji and i = E, R, P, and r = 1, 2, 3 represent experimental, reference and placebo respectively.

3. Proposed methods

3.1. Modified Park-Kim method

In this section, we modify the Park-Kim method by reformulating it in three-arm design case. We add placebo arm in the hypothesis, and the NI hypothesis H0: (μEμR)/μRμP−γ vs. (μEμR)/μRμP > −γ, where r is pre-specified margin. We have retained the numerator and made a modification to the denominator estimator, changing it to two sample of Hodges-Lehmann estimator.

Suppose Δ̂ = median{(XRjWPs), j = 1, . . ., nR, s = 1, . . ., nP}, where XRj is the jth response of reference treatment and WPs is the sth response of placebo treatment. And then define the order statistic of (XRjWPs) as H(1), H(2), . . ., H(nRnP). When nRnP is odd, nRnP = 2b+1 then b = (nRnP–1)/2, the Hodges-Lemann estimator would be Δ̂ = H(b+1), and when nRnP is even, nRnP = 2b then b = (nRnP)/2, the Hodges-Lemann estimator would be Δ̂ = (H(b) + H(b+1))/2. Therefore, the lower and upper limit of modified nonparametric 100 × (1 – α)% confidence interval of (μEμR)/μRμP is

LLN=Q(Cα)Δ^,         RjRj,(Rj,Rj=1,,nR)ULN=Q(nEnR+1-Cα)Δ^,         RjRj,(Rj,Rj=1,,nR)

respectively.

3.2. Modified Munzel method using unweighted relative effect

In section 2.2.2, we introduced the concept of a ‘usual rank’, which is calculated based on the total number of observations. Pseudo rank, on the other hand, is slightly different. It represents an unweighted rank, where we consider the total number of groups instead of the total number of observations. We refer to relative effects calculated using pseudo rank as ‘fixed relative effects’ because they are not influenced by the number of observations. The concept of unweighted relative effect was suggested by Brunner and Puri (1996) for the first time, and then asymptotic properties of unweighted relative effect was demonstrated by Domhof (2001). Brunner et al. (2021) cautioned that the weighted relative effect can vary according to the sample ratio, making it an unstable measure for use in non-inferiority trials. Therefore, we opted to use the pseudo rank method, which remains stable and is not affected by the sample ratio.

Let U=(1/K)Σi=1kFi denote the unweighted mean distribution. As mentioned above, it is not affected by total number of observations. Similar to usual rank, respective empirical version of function U is U^=(1/K)Σi=1kF^i. And let Ri jϕ represents the pseudo rank of Xi j among all k treatment groups, then Ri jϕ is defined as

Rijϕ=12+NU^(Xij)=12+NkΣl=1nrc(Xij-Xrl),

where U^(x)=(1/k)Σr=1kFr(x) denotes the unweighted mean of empirical distribution functions. It can be also estimated consistently by the simple plug-in estimator

p^iϕ=UdF^i=1N(R¯i.ϕ-12),

where R¯i.ϕ=(1/ni)Σj=1niRijϕ and U^=(1/k)Σr=1kF^r denotes the unweighted mean of empirical distributions 1, . . ., r. The value piϕ quantifies an effect of the distribution Fi with respect to the unweighted mean distribution U. Fixed relative effect p^iϕ is also related to mean of the pseudo rank R¯iϕ. The only difference between relative effect and fixed relative effect lies in the replacement of Ŵ(Xi j) with Û(Xi j). Therefore, the application unweighted relative effect to Munzel method is simply substituting (weighted) relative effect to unweighted relative effect. The null hypotheses are H0E: (pϕEpϕR)/PϕRpϕp ≤ –δ and A0: PϕRPϕPQ1. The corresponding alternative hypotheses are H1E: (pϕEpϕR)/pϕRpϕp > −δ and A1: PϕRPϕP > Q1, respectively. The two-sided (1–α)×100% confidence interval for ratio (pϕEpϕR)/pϕRpϕP is exactly identical with just substituting pi to pϕi, i = E, R, P.

4. Simulation study

4.1. Simulation scheme

To assess the performance of the testing methods introduced in the previous sections, we conducted an extensive simulation study. The criteria for demonstrating its performance are empirical level and statistical power. We aim to assess how the performance of the testing methods vary under various situations. We generate the data from three distributions (normal, gamma and exponential distribution). For the normal distribution, we consider both equal variance and unequal variance cases. In case of the gamma distribution, we used two different shape parameters (α) while keeping the rate parameter fixed (β = 2). Similarly, for the exponential distribution, we employed two rate parameters (λ). We also consider the four types of sample ratios reflecting the real-world situations. Additionally, sample size is also an important factor, and thus we have set the ratio 1 as equivalent to 20 and 50 people in small sample and large samples, respectively. Sample ratio and sample size variation is displayed in Table 2. The distribution of data used in simulation is displayed in Table 1.

For normal distribution case, we consider a total of four scenarios considering both homogeneous variance and heterogeneous variance. For gamma and exponential distribution cases, two scenarios are used in simulation study.

The first step of evaluating performance of testing methods is checking the empirical level. We calculated and compared to nominal level before proceeding to the statistical power comparison. We iterate a total of 10,000 times for each testing method, and thus the empirical level is considered valid when if falls within the interval [0.0457, 0.0543]. If the empirical level deviates from this interval, the comparison of statistical power becomes unreliable. After confirming that the empirical level is satisfied, statistical power is calculated. As in Hinda and Tango (2011), the power of the testing procedure is defined as

Power=Pr {TE>tα/2(nE+nR-2)TA>tα/2(nR+nP-2)H0E,A0}.

Setting the NI trial and assay sensitivity margin is also crucial in NI trial. The margins of parametric testing methods denoted by M1 and M2 are the same as in the previous studies. Subsequently, the corresponding margins of nonparametric testing methods, labeled as Q1 and Q2, are chosen based on the well-known property

pi=HdFi=1Nh=1knh·Φ(μi-μhσh2).

It means that Φ is almost linear around 0.5, i.e., it is approximately linearly connected to set the nonparametric effect similar to parametric effect. Thus, gamma and exponential distribution are also applied to its property using normal approximation. The margins for each testing method and distribution are demonstrated in Table 3.

Additionally, an assay sensitivity test must be conducted before initiating an NI trial. The NI trial is performed only after confirming assay sensitivity. If assay sensitivity is not established, the NI trial is not conducted and the simulation is terminated. Thus, the simulation steps are follows:

  • Generate data from each distribution with predefined mean, variance, or parameters.

  • Apply each testing method to the data and determine whether the hypothesis is rejected or accepted.

  • Repeat these steps 10,000 times for each method.

  • Calculate the empirical level and statistical power of each testing method.

All simulation studies were conducted using programming version 4.1.2 (R project homepage: http://www.r-project.org). The author developed the codes for generating rank statistics and implementing all testing methods from scratch.

4.2. Simulation results

To improve the readability of the tables, the abbreviations have been employed in the simulation result tables. Abbreviations are as follows: ‘HT-P’ (Hida-Tango parametric method), ‘M-PK’ (modified Park-Kim method), ‘M-UR’ (Munzel method using usual rank), ‘MM-PR’ (modified Munzel method using pseudo rank).

4.2.1. Empirical level for each testing method

To assess the performance of different testing methods in determining empirical levels, we conducted simulations for scenarios involving an experimental drug. Four testing methods, as described in Sections 2 and 3, were compared. While we cannot present all simulation result tables here, we highlight key findings below.

In scenarios where sample ratios remain consistent across the experimental drug, reference drug, and placebo, both the M-UR and MM-PR methods exhibit identical relative effects. Consequently, these methods demonstrate the same empirical level. For instance, in Table 4, when the sample ratios are 1: 1: 1, the empirical levels of M-UR and MM-PR across various distributions and parameters are consistent. All nonparametric testing methods demonstrate valid empirical levels in both normal and non-normal settings. Conversely, the parametric testing method proves to be invalid in all non-normal situations. Particularly, when data are generated under gamma and exponential distributions, the empirical level of the parametric testing method falls significantly below 0.0457, the lower limit of the nominal level of 0.5. The result demonstrates consistency when sample ratios are not equal across the experimental drug, reference drug, and placebo (see Table 5).

4.2.2. Statistical power of each testing method

The statistical power analysis reveals insights into the performance of each testing method under various conditions. When the sample ratio remains consistent across the experimental drug, reference drug, and placebo, the relative effects using usual rank and pseudo rank are identical. Consequently, as shown in Table 6, the statistical power of M-UR and MM-PR mirrors their empirical levels.

In Table 6, additional information about significant results is presented. Parametric testing methods demonstrate poor statistical power when data are generated from non-normal distributions, such as gamma and exponential distributions, with statistical power close to 0.5. We also compared statistical powers among different sample ratios. Four sample ratios were simulated, with a sample ratio of 2: 2: 1 exhibiting the highest statistical power. Although not tabulated here, sample ratios of 2: 1: 1, 1: 2: 1, and 1: 1: 1 follow in descending order. The statistical power of the 2: 2: 1 sample ratio is detailed in Table 7.

In the case of a sample ratio of 1: 1: 1, the parametric testing method demonstrates the highest statistical power under normal distribution, while the M-PK method exhibits the lowest. Conversely, in non-normal situations, the parametric testing method shows the lowest statistical power, with the order established as P < MPK < MUR = MMPR (see Table 6). Under a non-equal sample ratio of 2: 2: 1, the statistical power of M-UR and MM-PR differs, with the MM-PR method exhibiting higher statistical power than M-UR in both normal and non-normal situations. The order established is P < MPK < MUR < MMPR (see Table 7).

5. Conclusion

We have presented various NI trial methods, including both parametric and nonparametric approaches. Additionally, we modified Park-Kim method using two sample Hodes-Lehmann estimator. Also, in nonparametric method using relative effect, we applied unweighted relative effect which uses the pseudo rank. Pseudo rank is calculated only with the number of the treatment groups, not affected by the sample ratio of each treatment group. So, it is a fixed measure while usual rank is changed its value according to the sample ratio and thus yields unstable measure.

Now we summarize the major findings from our simulation studies.

  • Parametric testing methods demonstrate superiority under normal distribution, while among non-parametric methods, MM-PR exhibits the highest statistical power. Conversely, in non-normal scenarios, parametric methods falter, while MM-PR consistently display superior performance.

  • Notably high statistical power is observed in scenarios with a ratio of 2 for experimental drugs and the reference drug, and a ratio of 1 for the placebo.

  • Testing method based on unweighted relative effect (MM-PR) consistently outperform those based on weighted relative effect (M-UR).

  • Statistical power tends to be higher in situations with larger sample sizes. Particularly, the MM-PR method shows comparable performance to parametric methods in normal cases, while excelling in both normal and non-normal scenarios.

  • In small sample scenarios, the MM-PR method exhibits satisfactory statistical power, highlighting its effectiveness, especially in trials with limited sample availability, such as those for rare diseases.

For ease of comparison, we converted Table 7 into a bar graph (Figure 1). Scenarios 1 to 8 represent different distributions listed in Table 7. For example, Scenario 1 is a normal distribution with means (12, 14, 0) and variances (2, 2, 2), while Scenario 6 is a gamma distribution with shape parameters (5, 5, 1), means (10, 10, 2), and variances (20, 20, 4). Each method is represented by different colors. The HT-P method (light grey) performed well in Scenarios 1–4 (normal distributions) but was inferior in Scenarios 5–8 (non-normal distributions). The MM-PR method maintained its position as the second best in Scenarios 1–4 and outperformed in Scenarios 5–8.

We anticipate that our proposed MM-PR method will be particularly useful in rare disease clinical trials, where sample sizes are limited, owing to its minimal susceptibility to data distribution. Additionally, our ongoing research into nonparametric testing methods for cases involving multiple experimental drugs aims to provide further insights into the complexities of relative effect estimation and variance-covariance estimation. All simulation studies were conducted using programming version 4.1.2 (R project homepage: http://www.r-project.org). The author developed the codes for generating rank statistics and implementing all testing methods from scratch.

Acknowledgments

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. RS-2023-00208882, No. NRF-2022M3J6A1063595). This research was also supported and funded by the Korean National Police Agency [Project Name: Advancing the Appraisal Techniques of Forensic Entomology / Project Number: PR10-04-000-22].

Figures
Fig. 1. Bar graph comparing statistical power across methods (based on Table 7).
TABLES

Table 1

Performances for clustered DIC data after 1000 iterations

DistributionNormal distributionGamma distribution(β = 2)Exponential distribution
Parameters(μE, μR, μP)
(σE2,σR2,σP2)
(αE, αR, αP)
(μE, μR, μP)
(σE2,σR2,σP2)
(λE, λR, λP)
(μE, μR, μP)
(σE2,σR2,σP2)
Scenarios(12, 14, 0)
(2, 2, 2)
(12, 14, 0)
(2, 1, 1)
(4, 5, 1)
(8, 10, 2)
(16, 20, 4)
(0.5, 0.25, 5)
(2, 4, 0.2)
(4, 16, 0.04)
(14, 14, 0)
(2, 2, 2)
(14, 14, 0)
(2, 1, 1)
(5, 5, 1)
(10, 10, 2)
(20, 20, 4)
(0.25, 0.25, 5)
(4, 4, 0.2)
(16, 16, 0.04)

Table 2

Sample ratio and size of simulated scenarios

Ratio
nE: nR: nP
Small samples
(nE, nR, nP)
Large samples
(nE, nR, nP)
1: 1: 1(20, 20, 20)(50, 50, 50)
1: 2: 1(20, 40, 20)(50, 100, 50)
2: 2: 1(40, 40, 20)(100, 100, 50)
2: 1: 1(40, 20, 20)(100, 50, 50)

Table 3

Assay sensitivity margins (M1 and Q1) and NI margins (M2 and Q2) of each distribution and testing method

Parametric methodNonparametric method

M1M2Q1Q2
Normal distribution62.50.60.3
Gamma distribution420.20.15
Exponential distribution2.51.50.40.2

Table 4

Levels for NI trial hypothesis testing in case of sample ratio (nE, nR, nP) = (1: 1: 1): small samples, nominal level 0.05, from 10,000 random iterations

Distribution(μE, μR, μP)
(σE2,σR2,σP20
(nEN,nRN,nPN),1dM-URMM-PRM-PKHT-P
Normal dist.(12, 14, 0)
(2, 2, 2)
(13,13,13),130.04930.04930.05210.0497
(14, 14, 0)
(2, 2, 2)
0.04920.04920.05180.0504
(12, 14, 0)
(2, 1, 1)
0.05080.05080.05110.0501
(14, 14, 0)
(2, 1, 1)
0.05100.05100.05200.0496
Gamma dist. (β = 2)(αE,, αR, αP)
(μE,, μR, μP)
(σE2,σR2,σP2)
M-URMM-PRM-PKHT-P
(4, 5, 1)
(8, 10, 2)
(16, 20, 4)
0.05130.05130.04650.0307
(5, 5, 1)
(10, 10, 2)
(20, 20, 4)
0.05070.05070.04620.0314
Exponential dist.(λE, λR, λP)
(μE, μR, μP)
(σE2,σR2,σP2)
M-URMM-PRM-PKHT-P
(0.5, 0.25, 5)
(2, 4, 0.2)
(4, 16, 0.04)
0.04890.04890.05270.0256
(0.25, 0.25, 5)
(4, 4, 0.2)
(16, 16, 0.04)
0.04940.04940.05220.0263

M-UR = Munzel-usual rank; MM-PR = modified Munzel-pseudo rank; M-PK = modified Park-Kim; HT-P = Hida-Tango parametric


Table 5

Levels for NI trial hypothesis testing in case of sample ratio (nE, nR, nP) = (2: 2: 1): small samples, nominal level 0.05, from 10,000 random iterations

Distribution(μE, μR, μP)
(σE2,σR2,σP2)
(nEN,nRN,nPN),1dM-URMM-PRM-PKHT-P
Normal dist.(12, 14, 0)
(2, 2, 2)
(25,25,15),130.04800.04810.05130.0508
(14, 14, 0)
(2, 2, 2)
0.05130.04950.04800.0506
(12, 14, 0)
(2, 1, 1)
0.05030.04940.05190.0497
(14, 14, 0)
(2, 1, 1)
0.05050.05000.05220.0493
Gamma dist. (β = 2)(αE,, αR, αP)
(μE,, μR, μP)
(σE2,σR2,σP2)
M-URMM-PRM-PKHT-P
(4, 5, 1)
(8, 10, 2)
(16, 20, 4)
0.05160.05130.04700.0333
(5, 5, 1)
(10, 10, 2)
(20, 20, 4)
0.05180.05060.04730.0343
Exponential dist.(λE, λR, λP)
(μE, μR, μP)
(σE2,σR2,σP2)
M-URMM-PRM-PKHT-P
(0.5, 0.25, 5)
(2, 4, 0.2)
(4, 16, 0.04)
0.05130.04830.05210.0276
(0.25, 0.25, 5)
(4, 4, 0.2)
(16, 16, 0.04)
0.04950.05090.05250.0289

M-UR = Munzel-usual rank; MM-PR = modified Munzel-pseudo rank; M-PK = modified Park-Kim; HT-P = Hida-Tango parametric


Table 6

Statistical power for NI trial hypothesis testing in case of sample ratio (nE, nR, nP) = (1: 1: 1): small samples, nominal level 0.05, from 10,000 random iterations

Distribution(μE, μR, μP)
(σE2,σR2,σP2)
(nEN,nRN,nPN),1dM-URMM-PRM-PKHT-P
Normal dist.(12, 14, 0)
(2, 2, 2)
(13,13,13),130.8500.8500.5520.911
(14, 14, 0)
(2, 2, 2)
0.8570.8570.5650.917
(12, 14, 0)
(2, 1, 1)
0.8570.8570.5650.917
(14, 14, 0)
(2, 1, 1)
0.8630.8630.5720.923
Gamma dist. (β = 2)(αE,, αR, αP)
(μE,, μR, μP)
(σE2,σR2,σP2)
M-URMM-PRM-PKHT-P
(4, 5, 1)
(8, 10, 2)
(16, 20, 4)
0.8830.8830.6170.507
(5, 5, 1)
(10, 10, 2)
(20, 20, 4)
0.9080.9080.6250.512
Exponential dist.(λE, λR, λP)
(μE, μR, μP)
(σE2,σR2,σP2)
M-URMM-PRM-PKHT-P
(0.5, 0.25, 5)
(2, 4, 0.2)
(4, 16, 0.04)
0.8720.8720.4970.482
(0.25, 0.25, 5)
(4, 4, 0.2)
(16, 16, 0.04)
0.8980.8980.5040.493

M-UR = Munzel-usual rank; MM-PR = modified Munzel-pseudo rank; M-PK = modified Park-Kim; HT-P = Hida-Tango parametric.


Table 7

Statistical power for NI trial hypothesis testing in case of sample ratio (nE, nR, nP) = (2: 2: 1): small samples, nominal level 0.05, from 10,000 random iterations

Distribution(μE, μR, μP)
(σE2,σR2,σP2)
(nEN,nRN,nPN),1dM-URMM-PRM-PKHT-P
Normal dist.(12, 14, 0)
(2, 2, 2)
(12,14,14),130.8480.8590.5570.913
(14, 14, 0)
(2, 2, 2)
0.8600.8690.5750.925
(12, 14, 0)
(2, 1, 1)
0.8510.8630.5650.919
(14, 14, 0)
(2, 1, 1)
0.8670.8750.5840.930
Gamma dist. (β = 2)(αE,, αR, αP)
(μE,, μR, μP)
(σE2,σR2,σP2)
M-URMM-PRM-PKHT-P
(4, 5, 1)
(8, 10, 2)
(16, 20, 4)
0.8920.9030.6290.519
(5, 5, 1)
(10, 10, 2)
(20, 20, 4)
0.9130.9200.6350.523
Exponential dist.(λE, λR, λP)
(μE, μR, μP)
(σE2,σR2,σP2)
M-URMM-PRM-PKHT-P
(0.5, 0.25, 5)
(2, 4, 0.2)
(4, 16, 0.04)
0.8760.8840.5140.492
(0.25, 0.25, 5)
(4, 4, 0.2)
(16, 16, 0.04)
0.9040.9100.5170.503

M-UR = Munzel-usual rank; MM-PR = modified Munzel-pseudo rank; M-PK = modified Park-Kim; HT-P = Hida-Tango parametric.


References
  1. Brunner E, Bathke AC, and Konietschke F (2018). Rank and Pseudo-rank Procedures for Independent Observations in Factorial Designs, Springer International Publishing, Cham, Switzerland.
    CrossRef
  2. Brunner E, Konietschke F, Bathke AC, and Pauly M (2021). Ranks and Pseudo-ranks—Surprising results of certain rank tests in unbalanced designs. International Statistical Review, 89, 349-366.
    CrossRef
  3. Brunner E and Puri ML (1996). 19 nonparametric methods in design and analysis of experiments. Handbook of Statistics, 13, 631-703.
    CrossRef
  4. Domhof S (2001). Nichtparametrische relative Effekte (Ph D. thesis), University of Göttingen, Götting en.
  5. Guideline IHT (2000). Choice of control group and related issues in clinical trials E10 Choice. E, 10.
  6. Hida E and Tango T (2011). On the three-arm non-inferiority trial including a placebo with a prespecified margin. Statistics in Medicine, 30, 224-231.
    Pubmed CrossRef
  7. Huang LC, Wen MJ, and Cheung SH (2015). Noninferiority studies with multiple new treatments and heterogeneous variances. Journal of Biopharmaceutical Statistics, 25, 958-971.
    CrossRef
  8. Munzel U (2009). Nonparametric non-inferiority analyses in the three-arm design with active control and placebo. Statistics in Medicine, 28, 3643-3656.
    Pubmed CrossRef
  9. Park S and Kim D (2014). Nonparametric method for a non-inferiority test using confidence interval. The Korean Journal of Applied Statistics, 27, 833-842.
    CrossRef
  10. U.S. Food and Drug Administration (2016). Non-inferiority clinical trials to establish effectiveness - guidance for industry FDA guidance for industry.