
Most non-inferiority (NI) clinical trial is conducted with large-scale enrollment. With this scale, the parametric testing method is used to confirm the non-inferiority in general. Hida and Tango (2011) introduced a parametric t-test to assess the NI trial. In this approach, the mean of each treatment group is used as the measure of the hypotheses, and the test statistic follows a Student’s t-distribution and degrees of freedom vary according to variance’s homogeneity. However, in the case of rare diseases, it is difficult to recruit patients who meet the eligibility criteria. In this case, a clinical trial is terminated due to insufficient enrollment, and research on the rare disease could not persist. Therefore, there have been some research on NI trials with nonparametric methods, which are less restrictive than parametric methods. The nonparametric method is not affected by the normality of data and the homogeneity of variance.
In situations where nonparametric methods are appropriate, enrollment is small scale or normality of data is not assured, there have been several methods. In this paper, we consider the three-arm design non-inferiority clinical trial. The three-arm design indicates that the clinical trial with experimental, reference drug, and placebo. The importance of three-arm design is highlighted in various research since we can show the assay sensitivity in the presence of a placebo arm. In the ICH E10 guideline (2000), assay sensitivity is defined as the property of a clinical trial defined as the ability to distinguish an effective treatment from a less effective or inefficient treatment. Thus, in many non-inferiority clinical trials, demonstrating assay sensitivity takes precedence over establishing non-inferiority. Park and Kim (2014) proposed a ratio-shape formulation for the non-inferiority hypotheses. They utilized a Hodges-Lehmann estimator to test this non-inferiority hypothesis. However, there is an alternative approach to assessing nonparametric NI trials known as the ‘relative effect’ which uses the relative effect as a measure of the hypothesis instead of the mean effect. Munzel (2009) introduced a ratio-shaped hypothesis with relative effect.
In this paper, we introduce a modification of the method by Park and Kim (2014) applicable in a three-arm NI trial, along with a modification of Munzel (2009) method utilizing unweighted relative effect—a measure calculated with pseudo rank, offering advantages over traditional weighted rank. While various NI trial testing methods have been proposed, comprehensive comparisons of their performance remain scarce. Our primary focus lies in introducing and applying these methods. To rigorously assess their performance and effectiveness, we conduct a comprehensive comparison with existing testing methods through a carefully designed simulation study. By evaluating empirical levels and statistical powers, we aim to provide valuable insights into the practical utility of these methods in non-inferiority clinical trials.
This paper is structured as follows. Section 2 introduces existing NI trial testing methods, encompassing both parametric and nonparametric approaches. In Section 3, we present the modification of the method by Park and Kim for three-arm clinical trials and propose the utilization of unweighted relative effect in Munzel’s method. Section 4 presents the simulation results, where we evaluate the performance of the testing methods. Finally, Section 5 concludes our results and initiates a discussion on the implications of our findings.
Hida and Tango (2010) suggested a parametric testing method when a NI trial includes a single experimental, reference treatment and placebo. Assume that the primary endpoint under the three-arm is
To test the null hypotheses
and
where
and
respectively.
In case of heterogenous variance, the identical confidence interval is applied and the only difference is degrees of freedom. Detailed formulas are found in Huang
As mentioned in the introduction, we can divide nonparametric NI trial testing methods into two categories. The first category involves NI hypotheses testing with mean effect, while the other category focuses on NI hypothesis testing with relative effect. We first introduce the method by Park and Kim (2014). However, in our comparison scenario assuming the clinical trial with a three-arm design, Park-Kim method is not included because Park-Kim method is applied in case of two-arm clinical trial which contains only a single experimental drug and reference drug, not a placebo.
Park and Kim (2014) introduced a nonparametric NI trial based on the Wilcoxon rank-sum test and Hodges-Lehmann estimator of reference drug. Let
For all
respectively. In Section 3, we reformulate this method to be used in three-arm design.
Before we explain the Munzel method, the relative effect needs to be defined first. Let
where
where
Munzel (2009) suggested a non-inferiority testing and assay sensitivity null hypothesis
Applying Fieller’s theorem, the two-sided (1 –
where
and
Also, define
where
In this section, we modify the Park-Kim method by reformulating it in three-arm design case. We add placebo arm in the hypothesis, and the NI hypothesis
Suppose Δ̂ = median{(
respectively.
In section 2.2.2, we introduced the concept of a ‘usual rank’, which is calculated based on the total number of observations. Pseudo rank, on the other hand, is slightly different. It represents an unweighted rank, where we consider the total number of groups instead of the total number of observations. We refer to relative effects calculated using pseudo rank as ‘fixed relative effects’ because they are not influenced by the number of observations. The concept of unweighted relative effect was suggested by Brunner and Puri (1996) for the first time, and then asymptotic properties of unweighted relative effect was demonstrated by Domhof (2001). Brunner
Let
where
where
To assess the performance of the testing methods introduced in the previous sections, we conducted an extensive simulation study. The criteria for demonstrating its performance are empirical level and statistical power. We aim to assess how the performance of the testing methods vary under various situations. We generate the data from three distributions (normal, gamma and exponential distribution). For the normal distribution, we consider both equal variance and unequal variance cases. In case of the gamma distribution, we used two different shape parameters (
For normal distribution case, we consider a total of four scenarios considering both homogeneous variance and heterogeneous variance. For gamma and exponential distribution cases, two scenarios are used in simulation study.
The first step of evaluating performance of testing methods is checking the empirical level. We calculated and compared to nominal level before proceeding to the statistical power comparison. We iterate a total of 10,000 times for each testing method, and thus the empirical level is considered valid when if falls within the interval [0.0457, 0.0543]. If the empirical level deviates from this interval, the comparison of statistical power becomes unreliable. After confirming that the empirical level is satisfied, statistical power is calculated. As in Hinda and Tango (2011), the power of the testing procedure is defined as
Setting the NI trial and assay sensitivity margin is also crucial in NI trial. The margins of parametric testing methods denoted by
It means that Φ is almost linear around 0.5, i.e., it is approximately linearly connected to set the nonparametric effect similar to parametric effect. Thus, gamma and exponential distribution are also applied to its property using normal approximation. The margins for each testing method and distribution are demonstrated in Table 3.
Additionally, an assay sensitivity test must be conducted before initiating an NI trial. The NI trial is performed only after confirming assay sensitivity. If assay sensitivity is not established, the NI trial is not conducted and the simulation is terminated. Thus, the simulation steps are follows:
Generate data from each distribution with predefined mean, variance, or parameters.
Apply each testing method to the data and determine whether the hypothesis is rejected or accepted.
Repeat these steps 10,000 times for each method.
Calculate the empirical level and statistical power of each testing method.
All simulation studies were conducted using programming version 4.1.2 (R project homepage: http://www.r-project.org). The author developed the codes for generating rank statistics and implementing all testing methods from scratch.
To improve the readability of the tables, the abbreviations have been employed in the simulation result tables. Abbreviations are as follows: ‘HT-P’ (Hida-Tango parametric method), ‘M-PK’ (modified Park-Kim method), ‘M-UR’ (Munzel method using usual rank), ‘MM-PR’ (modified Munzel method using pseudo rank).
To assess the performance of different testing methods in determining empirical levels, we conducted simulations for scenarios involving an experimental drug. Four testing methods, as described in Sections 2 and 3, were compared. While we cannot present all simulation result tables here, we highlight key findings below.
In scenarios where sample ratios remain consistent across the experimental drug, reference drug, and placebo, both the M-UR and MM-PR methods exhibit identical relative effects. Consequently, these methods demonstrate the same empirical level. For instance, in Table 4, when the sample ratios are 1: 1: 1, the empirical levels of M-UR and MM-PR across various distributions and parameters are consistent. All nonparametric testing methods demonstrate valid empirical levels in both normal and non-normal settings. Conversely, the parametric testing method proves to be invalid in all non-normal situations. Particularly, when data are generated under gamma and exponential distributions, the empirical level of the parametric testing method falls significantly below 0.0457, the lower limit of the nominal level of 0.5. The result demonstrates consistency when sample ratios are not equal across the experimental drug, reference drug, and placebo (see Table 5).
The statistical power analysis reveals insights into the performance of each testing method under various conditions. When the sample ratio remains consistent across the experimental drug, reference drug, and placebo, the relative effects using usual rank and pseudo rank are identical. Consequently, as shown in Table 6, the statistical power of M-UR and MM-PR mirrors their empirical levels.
In Table 6, additional information about significant results is presented. Parametric testing methods demonstrate poor statistical power when data are generated from non-normal distributions, such as gamma and exponential distributions, with statistical power close to 0.5. We also compared statistical powers among different sample ratios. Four sample ratios were simulated, with a sample ratio of 2: 2: 1 exhibiting the highest statistical power. Although not tabulated here, sample ratios of 2: 1: 1, 1: 2: 1, and 1: 1: 1 follow in descending order. The statistical power of the 2: 2: 1 sample ratio is detailed in Table 7.
In the case of a sample ratio of 1: 1: 1, the parametric testing method demonstrates the highest statistical power under normal distribution, while the M-PK method exhibits the lowest. Conversely, in non-normal situations, the parametric testing method shows the lowest statistical power, with the order established as
We have presented various NI trial methods, including both parametric and nonparametric approaches. Additionally, we modified Park-Kim method using two sample Hodes-Lehmann estimator. Also, in nonparametric method using relative effect, we applied unweighted relative effect which uses the pseudo rank. Pseudo rank is calculated only with the number of the treatment groups, not affected by the sample ratio of each treatment group. So, it is a fixed measure while usual rank is changed its value according to the sample ratio and thus yields unstable measure.
Now we summarize the major findings from our simulation studies.
Parametric testing methods demonstrate superiority under normal distribution, while among non-parametric methods, MM-PR exhibits the highest statistical power. Conversely, in non-normal scenarios, parametric methods falter, while MM-PR consistently display superior performance.
Notably high statistical power is observed in scenarios with a ratio of 2 for experimental drugs and the reference drug, and a ratio of 1 for the placebo.
Testing method based on unweighted relative effect (MM-PR) consistently outperform those based on weighted relative effect (M-UR).
Statistical power tends to be higher in situations with larger sample sizes. Particularly, the MM-PR method shows comparable performance to parametric methods in normal cases, while excelling in both normal and non-normal scenarios.
In small sample scenarios, the MM-PR method exhibits satisfactory statistical power, highlighting its effectiveness, especially in trials with limited sample availability, such as those for rare diseases.
For ease of comparison, we converted Table 7 into a bar graph (Figure 1). Scenarios 1 to 8 represent different distributions listed in Table 7. For example, Scenario 1 is a normal distribution with means (12, 14, 0) and variances (2, 2, 2), while Scenario 6 is a gamma distribution with shape parameters (5, 5, 1), means (10, 10, 2), and variances (20, 20, 4). Each method is represented by different colors. The HT-P method (light grey) performed well in Scenarios 1–4 (normal distributions) but was inferior in Scenarios 5–8 (non-normal distributions). The MM-PR method maintained its position as the second best in Scenarios 1–4 and outperformed in Scenarios 5–8.
We anticipate that our proposed MM-PR method will be particularly useful in rare disease clinical trials, where sample sizes are limited, owing to its minimal susceptibility to data distribution. Additionally, our ongoing research into nonparametric testing methods for cases involving multiple experimental drugs aims to provide further insights into the complexities of relative effect estimation and variance-covariance estimation. All simulation studies were conducted using programming version 4.1.2 (R project homepage: http://www.r-project.org). The author developed the codes for generating rank statistics and implementing all testing methods from scratch.
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. RS-2023-00208882, No. NRF-2022M3J6A1063595). This research was also supported and funded by the Korean National Police Agency [Project Name: Advancing the Appraisal Techniques of Forensic Entomology / Project Number: PR10-04-000-22].
Performances for clustered DIC data after 1000 iterations
Distribution | Normal distribution | Gamma distribution( | Exponential distribution | |
---|---|---|---|---|
Parameters | ( ( | ( ( ( | ( ( ( | |
Scenarios | (12, 14, 0) (2, 2, 2) | (12, 14, 0) (2, 1, 1) | (4, 5, 1) (8, 10, 2) (16, 20, 4) | (0.5, 0.25, 5) (2, 4, 0.2) (4, 16, 0.04) |
(14, 14, 0) (2, 2, 2) | (14, 14, 0) (2, 1, 1) | (5, 5, 1) (10, 10, 2) (20, 20, 4) | (0.25, 0.25, 5) (4, 4, 0.2) (16, 16, 0.04) |
Sample ratio and size of simulated scenarios
Ratio | Small samples ( | Large samples ( |
---|---|---|
1: 1: 1 | (20, 20, 20) | (50, 50, 50) |
1: 2: 1 | (20, 40, 20) | (50, 100, 50) |
2: 2: 1 | (40, 40, 20) | (100, 100, 50) |
2: 1: 1 | (40, 20, 20) | (100, 50, 50) |
Assay sensitivity margins (
Parametric method | Nonparametric method | |||
---|---|---|---|---|
Normal distribution | 6 | 2.5 | 0.6 | 0.3 |
Gamma distribution | 4 | 2 | 0.2 | 0.15 |
Exponential distribution | 2.5 | 1.5 | 0.4 | 0.2 |
Levels for NI trial hypothesis testing in case of sample ratio (
Distribution | ( ( | M-UR | MM-PR | M-PK | HT-P | |
---|---|---|---|---|---|---|
Normal dist. | (12, 14, 0) (2, 2, 2) | 0.0493 | 0.0493 | 0.0521 | 0.0497 | |
(14, 14, 0) (2, 2, 2) | 0.0492 | 0.0492 | 0.0518 | 0.0504 | ||
(12, 14, 0) (2, 1, 1) | 0.0508 | 0.0508 | 0.0511 | 0.0501 | ||
(14, 14, 0) (2, 1, 1) | 0.0510 | 0.0510 | 0.0520 | 0.0496 | ||
Gamma dist. ( | ( ( ( | |||||
(4, 5, 1) (8, 10, 2) (16, 20, 4) | 0.0513 | 0.0513 | 0.0465 | 0.0307 | ||
(5, 5, 1) (10, 10, 2) (20, 20, 4) | 0.0507 | 0.0507 | 0.0462 | 0.0314 | ||
Exponential dist. | ( ( ( | |||||
(0.5, 0.25, 5) (2, 4, 0.2) (4, 16, 0.04) | 0.0489 | 0.0489 | 0.0527 | 0.0256 | ||
(0.25, 0.25, 5) (4, 4, 0.2) (16, 16, 0.04) | 0.0494 | 0.0494 | 0.0522 | 0.0263 |
M-UR = Munzel-usual rank; MM-PR = modified Munzel-pseudo rank; M-PK = modified Park-Kim; HT-P = Hida-Tango parametric
Levels for NI trial hypothesis testing in case of sample ratio (
Distribution | ( ( | M-UR | MM-PR | M-PK | HT-P | |
---|---|---|---|---|---|---|
Normal dist. | (12, 14, 0) (2, 2, 2) | 0.0480 | 0.0481 | 0.0513 | 0.0508 | |
(14, 14, 0) (2, 2, 2) | 0.0513 | 0.0495 | 0.0480 | 0.0506 | ||
(12, 14, 0) (2, 1, 1) | 0.0503 | 0.0494 | 0.0519 | 0.0497 | ||
(14, 14, 0) (2, 1, 1) | 0.0505 | 0.0500 | 0.0522 | 0.0493 | ||
Gamma dist. ( | ( ( ( | |||||
(4, 5, 1) (8, 10, 2) (16, 20, 4) | 0.0516 | 0.0513 | 0.0470 | 0.0333 | ||
(5, 5, 1) (10, 10, 2) (20, 20, 4) | 0.0518 | 0.0506 | 0.0473 | 0.0343 | ||
Exponential dist. | ( ( ( | |||||
(0.5, 0.25, 5) (2, 4, 0.2) (4, 16, 0.04) | 0.0513 | 0.0483 | 0.0521 | 0.0276 | ||
(0.25, 0.25, 5) (4, 4, 0.2) (16, 16, 0.04) | 0.0495 | 0.0509 | 0.0525 | 0.0289 |
M-UR = Munzel-usual rank; MM-PR = modified Munzel-pseudo rank; M-PK = modified Park-Kim; HT-P = Hida-Tango parametric
Statistical power for NI trial hypothesis testing in case of sample ratio (
Distribution | ( ( | M-UR | MM-PR | M-PK | HT-P | |
---|---|---|---|---|---|---|
Normal dist. | (12, 14, 0) (2, 2, 2) | 0.850 | 0.850 | 0.552 | 0.911 | |
(14, 14, 0) (2, 2, 2) | 0.857 | 0.857 | 0.565 | 0.917 | ||
(12, 14, 0) (2, 1, 1) | 0.857 | 0.857 | 0.565 | 0.917 | ||
(14, 14, 0) (2, 1, 1) | 0.863 | 0.863 | 0.572 | 0.923 | ||
Gamma dist. ( | ( ( ( | |||||
(4, 5, 1) (8, 10, 2) (16, 20, 4) | 0.883 | 0.883 | 0.617 | 0.507 | ||
(5, 5, 1) (10, 10, 2) (20, 20, 4) | 0.908 | 0.908 | 0.625 | 0.512 | ||
Exponential dist. | ( ( ( | |||||
(0.5, 0.25, 5) (2, 4, 0.2) (4, 16, 0.04) | 0.872 | 0.872 | 0.497 | 0.482 | ||
(0.25, 0.25, 5) (4, 4, 0.2) (16, 16, 0.04) | 0.898 | 0.898 | 0.504 | 0.493 |
M-UR = Munzel-usual rank; MM-PR = modified Munzel-pseudo rank; M-PK = modified Park-Kim; HT-P = Hida-Tango parametric.
Statistical power for NI trial hypothesis testing in case of sample ratio (
Distribution | ( ( | M-UR | MM-PR | M-PK | HT-P | |
---|---|---|---|---|---|---|
Normal dist. | (12, 14, 0) (2, 2, 2) | 0.848 | 0.859 | 0.557 | 0.913 | |
(14, 14, 0) (2, 2, 2) | 0.860 | 0.869 | 0.575 | 0.925 | ||
(12, 14, 0) (2, 1, 1) | 0.851 | 0.863 | 0.565 | 0.919 | ||
(14, 14, 0) (2, 1, 1) | 0.867 | 0.875 | 0.584 | 0.930 | ||
Gamma dist. ( | ( ( ( | |||||
(4, 5, 1) (8, 10, 2) (16, 20, 4) | 0.892 | 0.903 | 0.629 | 0.519 | ||
(5, 5, 1) (10, 10, 2) (20, 20, 4) | 0.913 | 0.920 | 0.635 | 0.523 | ||
Exponential dist. | ( ( ( | |||||
(0.5, 0.25, 5) (2, 4, 0.2) (4, 16, 0.04) | 0.876 | 0.884 | 0.514 | 0.492 | ||
(0.25, 0.25, 5) (4, 4, 0.2) (16, 16, 0.04) | 0.904 | 0.910 | 0.517 | 0.503 |
M-UR = Munzel-usual rank; MM-PR = modified Munzel-pseudo rank; M-PK = modified Park-Kim; HT-P = Hida-Tango parametric.