TEXT SIZE

CrossRef (0)
Statistical tests for biosimilarity based on relative distance between follow-on biologics for ordinal endpoints

Myung Soo Yooa, Donguk Kim1,a

aDepartment of Statistics, Sungkyunkwan University, Korea
Correspondence to: 1Department of Statistics, Sungkyunkwan University, 25-2, Sungkyunkwan-ro, Jongno-gu, Seoul 03063, Korea. E-mail: dkim@skku.edu
Received December 18, 2018; Revised July 19, 2019; Accepted July 20, 2019.
Abstract
Investigations of biosimilarity between reference drugs and test drugs required statistical tests; in addition, statistical tests to evaluate biosimilarity have been recently proposed. Ordinal outcome data has been observed in research; however, appropriate statistical tests to deal with ordinal endpoints for biosimilar have not yet been proposed. This paper extends existing design for ordinal endpoints. Using measure of nominal-ordinal association and relative distances between drugs are defined so that testing procedures are developed. Through simulation studies, we investigate type I error rate and power to show the performance of our suggested method. Furthermore, a comparison between the statistical tests and other designs is proviede to show significance of ordinal endpoints.
Keywords : biosimilar, nominal-ordinal association, equivalence trials, three-arm parallel design
1. Introduction

Biological products, medicines produced from living organisms, have opened the new approach to the treatment of several types of diseases such as cancer (Vulto and Jaquez, 2017). However, the use of biological products is limited due to high costs. As the patents of biological products expire, generic of biological products, which are called biosimilar (or follow-on biologics), has received interest from the pharmaceutical industry and related areas. Biosimilar is known to have similar effects with biological products, but it has considerable advantages in that it allows people to have more affordable access to drugs than biological products. Biological products have different characteristics from chemical drugs. In addition, biological products have more complex structures that are likely to have a larger variance than chemical drugs. A different approach to assess statistical similarity between reference drugs and test drugs are required due to the fundamental differences between biological products and chemical products.

Various statistical tests to assess biosimilarity have been proposed (Chen et al., 2017; Shin and Kang, 2016; Lu et al., 2014; Zhang et al., 2014; Yang et al., 2012; Kang and Chow, 2013). Kang and Chow (2013) considered statistical tests based on the ratio estimator and linearization method in a three-arm parallel design to determine biosimilarity between reference drugs and test drugs for continuous endpoints. Based on the power function, simulation to compare each methods showed that power of ratio estimator is greater than that of the linearization method. Lu et al. (2014) took the frequency estimator into account to assess biosimilarity in three-arm parallel design. Absolute difference between reference drugs and test drugs with an indicator function is considered to formulate frequency estimator. Power of frequency estimator is compared to that of the ratio estimator and linearization method by Kang and Chow (2013), which demonstrated that the frequency estimator is more powerful than the other two methods when a suitable biosimilar margin is chosen. Kang and Shin (2015) extended three-arm parallel design to (k + 1)-arm parallel design. Shin and Kang (2016) extended statistical test based on the ratio estimator for binary endpoints. For the binary endpoints, the risk difference, the log relative risk, and the log odds ratio were taken for the ratio estimator. A type I error rate and power are also investigated theoretically and empirically. Chen et al. (2017) recently, assessed biosimilarity based on tolerance limits in a two-arm parallel design. The biosimilarity index based on tolerance limits was found to be more stringent and conservative when compared to other moment-based criterion, especially when variance of biosimilar is larger than that of reference drugs.

Yang et al. (2012) demonstrated that the assessment of biosimilarity in variance should be concerned, so that they considered an adapted F-test, which is an extension of traditional F-test for homogeneity of variability. To weaken the normal assumption for adapted F-test, nonparametric tests were also proposed by Zhang et al. (2014). The nonparametric tests were found to be robust in controlling type I error rate, especially when the underlying distribution is skewed or has a heavy tail. Research on sample size calculation for the assessment of biosimilarity have been also considered by Kang and Kim (2014) and Kang et al. (2015). US FDA (2015) provided a guideline to develop biosimilar in which sponsors are required to choose endpoints to demonstrate biosimilarity as well as to explain the reason why they choose such a design.

Of the proposed methods, Kang and Chow (2013) developed three-arm parallel design utilizing a relative distance between test drugs and reference drugs as a biosimilarity criterion. In this design, relative distance is defined as

$rd = d ( T , R ) d ( R 1 , R 2 ) ,$

where distance between test drugs and reference drugs are denoted by d(T, R) and d(R1, R2) represents the distance between reference drugs from two different batches. With this criterion, they considered the following statistical tests.

$H 0 : rd ≥ δ versus H A : rd < δ ,$

where δ (δ > 0) is prespecified margin. For the continuous endpoints, relative distance could be interpreted as absolute value of mean difference between test drugs and reference drugs scaled by the absolute value of mean difference between reference drugs from different batches. Relative distance employs distance between reference drugs from different batches as a denominator; therefore, a larger variability of biological product is considered in assessing biosimilarity, in which two-arm balanced design fails to accomplish. For more detail, refer to Kang and Chow (2013).

This work makes an extension of three-arm parallel design for ordinal endpoints based on the advantage of three-arm parallel design. An example of ordinal endpoints could be found in Doll and Pygott (1952); the change in the size of an ulcer crater after three months of treatment (larger, less than 2/3 healed, 2/3 or more healed, healed). Ordinal outcome data analyzed after being dichotomized into the form of binary data are shown to cause some loss of information in ordinal data (Sankey and Weissfeld, 1998). Roozenbeek et al. (2011) also demonstrated that statistical power can increase when ordinal analysis is conducted, instead of collapsing data into a binary form. Following this research, we focus on the statistical tests to assess biosimilarity, when endpoints have the ordinal form.

In the next Section, using summary measures for nominal-ordinal association, relative distance on new three-arm parallel design for ordinal endpoints are defined. Section 3 describes statistical tests for biosimilarity. Section 4 theoretically and empirically investigates as well as compares type I error rate and power. Section 5 conducts simulation studies to show applicability and the significance of ordinal endpoints in determining biosimilarity. Finally, Session 6 provides the conclusion and suggestions for future study.

2. Distances for ordinal endpoints on three-arm parallel design

In this Section, we define relative distance on a new three-arm parallel design for ordinal endpoints. Section 1 indicated that relative distance is defined as a ratio of two distances; one between test drug (biosimilar) and reference drugs and another one between reference drugs from different batches. Two summary measures of the degree of association between a nominal variable and ordinal variable, which are denoted by Δ and α, are taken for relative distance. Δ and α are defined as follows (Agresti, 1981).

$Δ = P ( Y 1 > Y 2 ) - P ( Y 2 > Y 1 ) = ∑ i > j π i ∣ 1 π j ∣ 2 - ∑ i < j π i ∣ 1 π j ∣ 2 ,$ $α = P ( Y 1 > Y 2 ) P ( Y 2 > Y 1 ) = ∑ i > j π i ∣ 1 π j ∣ 2 ∑ i < j π i ∣ 1 π j ∣ 2 ,$

where Y1 and Y2 denote independent random variables representing ordinal category numbers of the response variable for subjects selected at random from group 1 and group 2, respectively. Also, πj|i = πi ji+ with πi j denoting cell probability for the ith row and jth column and πi+ = ∑j πi j for two-way contingency tables. In the development of biosimilar, group 1 and group 2 could be patients taking test drugs (biosimilar) and those taking reference drugs in a clinical trial. Δ can be interpreted as the difference of two probabilities; one that group 1 (patients taking test drugs) shows higher ordinal category numbers (e.g., better response in clinical trial) than group 2 (patients taking reference drugs) and another that group 2 shows higher ordinal category numbers than group 1. However, α represents the ratio between two probabilities explained in the case of Δ. These measures utilize ordering of the levels of the ordinal variables. Furthermore, note that when ordinal category number K = 2, Δ is equal to π1|2π1|1 while α is equal to (π12π21)/(π11π22) (Agresti, 1981).

The sample versions of these two measures are easily defined by substituting cell probability with sample proportion,

$Δ ^ = ∑ i > j π ^ i ∣ 1 π ^ j ∣ 2 - ∑ i < j π ^ i ∣ 1 π ^ j ∣ 2 ,$ $α ^ = ∑ i > j π ^ i ∣ 1 π ^ j ∣ 2 ∑ i < j π ^ i ∣ 1 π ^ j ∣ 2 ,$

where $π^j|i$ = π̂i j/π̂i+ with π̂i j denoting sample proportion of ith row and jth column and π̂i+ = ∑j π̂i j. Agresti (1981) also demonstrated asymptotic normal distribution for (2.3) and (2.4).

Using Δ and log α, we define the two relative distances on the three-arm parallel design. Let T, R1, and R2 denote test drugs (biosimilar) and reference drugs from two different batches, respectively. Assume that total number of patients, N, are randomized into three groups. The number of each group is represented by ni, i = 1, 2, 3; number of group 1 taking test drugs (biosimilar), those of group 2 and group 3 receiving reference drugs from different batches. We employ 1 : 1 : 1 as a randomization ratio, N = n1 + n2 + n3. By assuming that YT ~ Multi(n1, πT), YR1 ~ Multi(n2, πR1 ), YR2 ~ Multi(n3, πR2 ), where YT , YR1 , YR2 are independent random variables denoting ordinal category number of the response variables with πT, πR1 , πR2 representing cell probabilities for each group. Distances with measure Δ and α are defined as

$d Δ ( T , R ) = | Δ T R | = ∣ P ( Y T > Y R ) - P ( Y R > Y T ) ∣ = | ∑ i > j π i ∣ T π j ∣ R - ∑ i < j π i ∣ T π j ∣ R | ,$ $d Δ ( R 1 , R 2 ) = | Δ R 1 R 2 | = | ∑ i > j π i ∣ R 1 π j ∣ R 2 - ∑ i < j π i ∣ R 1 π j ∣ R 2 | ,$ $d α ( T , R ) = | log α T R | = | log ∑ i > j π i ∣ T π j ∣ R - log ∑ i < j π i ∣ T π j ∣ R | ,$ $d α ( R 1 , R 2 ) = | log α R 1 R 2 | = | log ∑ i > j π i ∣ R 1 π j ∣ R 2 - log ∑ i < j π i ∣ R 1 π j ∣ R 2 | ,$

where ΔR1R2 and log αR1R2 represent measures Δ and log α from groups taking reference drugs from batches 1 and 2. With πi|R being defined as the arithmetic mean of πi|R1 and πi|R2 , ΔTR and log αTR could be seen as each measure from a group taking test drugs and one taking reference drugs.

With these two pairs of distances, relative distance, rdΔ and rdα, can be defined as

$rd Δ = d Δ ( T , R ) d Δ ( R 1 , R 2 ) = | ∑ i > j π i ∣ T π j ∣ R - ∑ i < j π i ∣ T π j ∣ R ∑ i > j π i ∣ R 1 π j ∣ R 2 - ∑ i < j π i ∣ R 1 π j ∣ R 2 | ,$ $rd α = d α ( T , R ) d α ( R 1 , R 2 ) = | log ∑ i > j π i ∣ T π j ∣ R - log ∑ i < j π i ∣ T π j ∣ R log ∑ i > j π i ∣ R 1 π j ∣ R 2 - log ∑ i < j π i ∣ R 1 π j ∣ R 2 j | .$

Assessment of biosimilarity is conducted based on these relative distance with prespecified margin δ(δ > 0) as follows.

$H 0 : rd ≥ δ versus H A : rd < δ .$

Note that (2.11) with measure Δ and log α could be expressed, respectively, as

$H 0 : | Δ T R Δ R 1 R 2 | ≥ δ Δ versus H A : | Δ T R Δ R 1 R 2 | < δ Δ ,$ $H 0 : | log α T R log α R 1 R 2 | ≥ δ α versus H A : | log α T R log α R 1 R 2 | < δ α .$
3. Statistical tests for biosimilarity

In the last section, we define relative distance using nominal-ordinal association measures, Δ and log α. Statistical tests for biosimilarity based on two relative distances are provided in this section. We derive the asymptotic distribution of the test statistic along with its testing procedure will be explained. From now on, ΔTR/ΔR1R2 and log αTR/log αR1R2 will be denoted as θΔ and θα, respectively.

### 3.1. Statistical tests based on Δ with k-categories

Note that (2.12) can be decomposed into two one-sided tests such as

$H 01 : θ Δ ≥ δ Δ versus H A 1 : θ Δ < δ Δ ,$

and

$H 02 : θ Δ ≤ - δ Δ versus H A 2 : θ Δ > - δ Δ ,$

where δΔ(> 0) is predefined margin. When both hypotheses tests are rejected at a significance level, it could be claimed that test drugs has similar effect with reference drugs. The reasonable test statistic for θΔ could be obtained by substituting πj|i with $π^j|i$, and is denoted as $θ^∆$.

Before deriving the asymptotic distribution of $θ^∆$, note that if X ~ Multi(n,π) with π = (π1, . . . , πk),

$n [ ( π ^ 1 π ^ 2 ⋮ π ^ k ) - ( π 1 π 2 ⋮ π k ) ] → d N k [ ( 0 0 ⋮ 0 ) , Σ ] ,$

where ∑ = Diag(π) – ππ′. We use multivariate delta method to derive the asymptotic distribution of $θ^∆$. Note that θΔ can be seen as g1 function of πR1 , πR2 , and πT as follows.

$θ Δ = g 1 ( π R 1 , π R 2 , π T ) = 1 2 ∑ j = 1 k - 1 ( π j ∣ R 1 + π j ∣ R 2 ) ( ∑ q = j + 1 k π q ∣ T ) - 1 2 ∑ j = 1 k - 1 π j ∣ T ( ∑ q = j + 1 k ( π q ∣ R 1 + π q ∣ R 2 ) ) ∑ j = 1 k - 1 π j ∣ R 2 ( ∑ q = j + 1 k π q ∣ R 1 ) - ∑ j = 1 k - 1 π j ∣ R 1 ( ∑ q = j + 1 k π q ∣ R 2 ) ,$

where πj|Ri is cell probability of jth category in group taking one of reference drug from batch i, i = 1, 2 and πj|T is cell probability of jth category in group taking test drugs. Then, it can be seen that (see Appendix)

$n 1 ( θ ^ Δ - θ Δ ) → d N ( 0 , σ Δ 2 ) , σ Δ 2 = B 1 Σ B 1 ′ ,$

where B1 = (dg1/dπ1|R1, . . . , dg1/dπk–1|R1 , dg1/dπ1|R2, . . . , dg1/dπk–1|R2 , dg1/dπ1|T , . . . , dg1/dπk–1|RT ) and ∑ is computed as follows.

$Σ = [ Σ 1 Σ 2 Σ 3 ] ,$

where

$Σ 1 = [ π 1 ∣ R 1 ( 1 - π 1 ∣ R 1 ) - π 1 ∣ R 1 π 2 ∣ R 1 ⋯ - π 1 ∣ R 1 π k - 1 ∣ R 1 - π 2 ∣ R 1 π 1 ∣ R 1 π 2 ∣ R 1 ( 1 - π 2 ∣ R 1 ) ⋯ - π 2 ∣ R 1 π k - 1 ∣ R 1 ⋮ ⋮ ⋱ ⋮ - π k - 1 ∣ R 1 π 1 ∣ R 1 - π k - 1 ∣ R 1 π 2 ∣ R 1 ⋯ π k - 1 ∣ R 1 ( 1 - π k - 1 ∣ R 1 ) ] ,$ $Σ 2 = [ π 1 ∣ R 2 ( 1 - π 1 ∣ R 2 ) - π 1 ∣ R 2 π 2 ∣ R 2 ⋯ - π 1 ∣ R 2 π k - 1 ∣ R 2 - π 2 ∣ R 2 π 1 ∣ R 2 π 2 ∣ R 2 ( 1 - π 2 ∣ R 2 ) ⋯ - π 2 ∣ R 2 π k - 1 ∣ R 2 ⋮ ⋮ ⋱ ⋮ - π k - 1 ∣ R 2 π 1 ∣ R 2 - π k - 1 ∣ R 2 π 2 ∣ R 2 ⋯ π k - 1 ∣ R 2 ( 1 - π k - 1 ∣ R 2 ) ] ,$ $Σ 3 = [ π 1 ∣ T ( 1 - π 1 ∣ T ) - π 1 ∣ T π 2 ∣ T ⋯ - π 1 ∣ T π k - 1 ∣ T - π 2 ∣ T π 1 ∣ T π 2 ∣ T ( 1 - π 2 ∣ T ) ⋯ - π 2 ∣ T π k - 1 ∣ T ⋮ ⋮ ⋱ ⋮ - π k - 1 ∣ T π 1 ∣ T - π k - 1 ∣ T π 2 ∣ T ⋯ π k - 1 ∣ T ( 1 - π k - 1 ∣ T ) ] .$

Using the asymptotic normal distribution of $θ^∆$, hypothesis tests in (3.1) and (3.2) could be conducted with Z and Z, respectively. Therefore,

$Z 1 Δ < - z α and Z 2 Δ > z α ,$

where zα is the 100α upper percentile of the standard normal distribution. Z and Z is

$Z 1 Δ = θ ^ Δ - δ Δ σ ^ Δ / n 1 and Z 2 Δ = θ ^ Δ + δ Δ σ ^ Δ / n 1 ,$

where $σ^∆$ can be computed by substituting πj|i with $π^j|i$. If each null hypothesis in (3.1) and (3.2) is rejected at a significance level α, then we can conclude that test drugs (biosimilar) is similar to reference drugs.

### 3.2. Statistical tests based on log α with k-categories

Similarly, (2.13) can be decomposed into two one-sided tests as follows.

$H 01 : θ α ≥ δ α versus H A 1 : θ α < δ α ,$

and

$H 02 : θ α ≤ - δ α versus H A 2 : θ α > - δ α ,$

where δα (> 0) is predefined margin. Note that θα can be expressed as g2 function of πR1 , πR2 , and πT such that

$θ α = g 2 ( π R 1 , π R 2 , π T ) = log [ ∑ j = 1 k - 1 π j ∣ R 1 + π j ∣ R 2 2 ( ∑ q = j + 1 k π q ∣ T ) ] - log [ ∑ j = 1 k - 1 π j ∣ T ( ∑ q = j + 1 k π q ∣ R 1 + π q ∣ R 2 2 ) ] log [ ∑ j = 1 k - 1 π j ∣ R 2 ( ∑ q = j + 1 k π q ∣ R 1 ) ] - log [ ∑ j = 1 k - 1 π j ∣ R 1 ( ∑ q = j + 1 k π q ∣ R 2 ) ] .$

Then, it can also be shown that (see rid="app1" ref-type="app">Appendix)

$n 1 ( θ ^ α - θ α ) → d N ( 0 , σ α 2 ) , σ α 2 = B 2 Σ B 2 ′ ,$

where B2 = (dg2/dπ1|R1, . . . , dg2/dπk–1|R1 , dg2/dπ1|R2, . . . , dg2/dπk–1|R2 , dg2/dπ1|T , . . . , dg2/dπk–1|RT ) and ∑ is as same as (3.6). Using asymptotic normality of $θ^α$, hypotheses tests in (3.12) and (3.13) can be conducted. As before, we could claim that biosimilarity between test drugs (biosimilar) and reference drugs are established if each null hypothesis in (3.12) and (3.13) is rejected at a significance level α. Therefore,

$Z 1 α < - z α and Z 2 α > z α ,$

where $Z 1 α = ( θ ^ α - δ α ) / ( σ ^ α / n 1 ) , Z 2 α = ( θ ^ α + δ α ) / ( σ ^ α / n 1 )$, and zα = 100α upper percentile of the standard normal distribution.

4. Type I error rate and power function

In this Section, type I error rate and power are investigated based on both measures Δ and log α. Shin and Kang (2016) derived power function for binary endpoints; however, we deal with ordinal endpoints. For predefined margin δΔ > 0, if $z α < δ Δ / ( σ Δ / n 1 )$, then type I error rate at θΔ = δΔ based on measure Δ is

$P ( θ ^ Δ + δ Δ σ ^ Δ / n 1 > z α and θ ^ Δ - δ Δ σ ^ Δ / n 1 < - z α ∣ θ Δ = δ Δ ) ≃ P ( Z 1 Δ > z α - 2 δ Δ σ Δ / n 1 and Z 1 Δ < - z α ∣ θ Δ = δ Δ ) .$

Similarly, type I error rate at θΔ = −δΔ is

$P ( θ ^ Δ + δ Δ σ ^ Δ / n 1 > z α and θ ^ Δ - δ Δ σ ^ Δ / n 1 < - z α ∣ θ Δ = - δ Δ ) = P ( θ ^ Δ - ( - δ Δ ) σ ^ Δ / n 1 > z α and θ ^ Δ + δ Δ - 2 δ Δ σ ^ Δ / n 1 < - z α ∣ θ Δ = - δ Δ ) ≃ P ( Z 2 Δ > z α and Z 2 Δ < - z α + 2 δ Δ σ Δ / n 1 | θ Δ = - δ Δ ) ,$

where $Z 1 Δ = ( θ ^ Δ - δ Δ ) / ( σ ^ Δ / n 1 ) , Z 2 Δ = ( θ ^ Δ + δ Δ ) / ( σ ^ Δ / n 1 )$. The power can be computed with the formulas (4.1) and (4.2) under the alternative hypothesis. In some cases, the condition $z α < δ Δ / ( σ Δ / n 1 )$ is not satisfied so that type I error rate and power cannot be computed. Note that formula for both type I error rate and power based on log α can be derived similarly with (4.1) and (4.2).

Under different settings, theoretical type I error rate and empirical type I error rate based on Δ and log α are investigated. For the empirical type I error rate, random samples under null hypothesis are generated according to sample size. After 5,000 replication, empirical type I error rate is calculated as the proportion of rejecting null hypothesis. Note that the empirical power can be calculated similarly. In our simulation settings, number of categories, K, is fixed as 3, but similar results could be obtained for other cases. Tables 14 gives the results.

Tables 1 and 2 indicate that the theoretical type I error rate and empirical type I error rate vary according to parameter settings. For example, when (π1|T, π2|T, π3|T ) = (0.6, 0.2, 0.2) the empirical type I error rate based on Δ is 0.078, whereas the one based on log α is 0.065 for n1 = 1000. When (π1|T, π2|T, π3|T ) = (0.14, 0.41, 0.45), empirical type I error rate based on Δ is 0.056; however, the one based on measure log α is 0.075 for n1 = 1000. It could be found that empirical type I error rate of both proposed measures becomes closer to the theoretical type I error rate, in general, as n1 becomes large. Tables 3 and 4 present the theoretical power and empirical power based on both measures under various settings. As in the case of type I error rate, theoretical power and empirical power depend on parameter settings. Both measure Δ and log α yield a similar power. For example, when (π1|T, π2|T, π3|T ) = (0.4, 0.38, 0.22), theoretical power based on measure Δ is 0.956; however, the one based on measure log α is 0.961 for n1 = 500. In addition, it also could be seen that both powers increases as sample size increases.

5. Numerical results

We assume that ordinal data with categories K = 4 are dichotomized into binary data; first and second ordinal categories are collapsed into one category and the other ordinal categories are collapsed into the other category. To show the significance of ordinal endpoints in evaluating biosimilarity, we compare the following probabilities based on a simulation study.

$P o = the probability of biosimilarity using ordinal data , P b = the probability of biosimilarity using dichotomized data.$

For Pb, we utilize method by Shin and Kang (2016) in which relative distances for binary endpoints are proposed as biosimilarity criterion. They take risk difference and log odds ratio as a component of relative distance. Under parameter settings and a predefined margin, n1 = 500 random samples are generated. With 5000 replication, Po and Pb are calculated as proportion that biosimilarity is concluded. Table 5 represents Po and Pb for Δ and risk difference. Po is less than Pb since noticeable difference of parameters in ordinal categories become unclear when categories are dichotomized, which can result in a greater probability of concluding biosimilarity. For example, under the following settings based on Δ and risk difference,

$π 1 ∣ R 1 , … , π 4 ∣ R 1 = ( 0.6 , 0.1 , 0.2 , 0.1 ) , π 1 ∣ R 2 , … , π 4 ∣ R 2 = ( 0.5 , 0.1 , 0.2 , 0.2 ) , π 1 ∣ T , … , π 4 ∣ T = ( 0.2 , 0.4 , 0.3 , 0.1 )$

parameters for reference drugs and test drugs (biosimilar) show significant difference, but this difference becomes obscure after being dichotomized as follows.

$π 1 * ∣ R 1 , π 2 * ∣ R 1 = ( 0.7 , 0.3 ) , π 1 * ∣ R 2 , π 2 * ∣ R 2 = ( 0.6 , 0.4 ) , π 1 * ∣ T 1 , π 2 * ∣ T = ( 0.6 , 0.4 ) ,$

where $π 1 * ∣ R i , π 2 * ∣ R i$ are collapsed cell probabilities of Ri, i = 1, 2, and $π 1 * ∣ T , π 2 * ∣ T$ are collapsed cell probabilities of T. In this case, Po = 0.225 means that chance of claiming biosimilarity is 22.5%. However, Pb = 0.896 means that there is a 89.6% chance that biosimilarity would be concluded, which is greater than Po. Similar results can be seen in Table 6 when po and pb are computed based on measures logα and log odds ratio. In this sense, one may conclude that assessing biosimilarity using original ordinal data might result in different results from those using dichotomized data. It would be reasonable not to claim biosimilarity if ordinal parameters for simulation settings show significant difference between reference drugs and test drugs. In that sense, our numerical results show that using ordinal data as original might be more appropriate than dichotomized data.

6. Conclusion and future study

Biosimilar, which has a more affordable price than biological products, could provide people with alternative treatments for several types of disease. Existing statistical methods to assess similarity between reference drugs and test drugs cannot be used directly due to fundamental differences between biological products and chemical products. Accordingly, numerous research to show biosimilarity in a statistical aspect has been conducted. In particular, Shin and Kang (2016) considered three-arm parallel design for binary endpoints. However, determining biosimilarity with ordinal endpoints has not been studied. Unless there exists biosimilarity criterion for ordinal endpoints, ordinal data should be dichotomized, which might cause some loss of information in ordinal data.

We propose two biosimilarity criteria for ordinal endpoints by making an extension of three-arm parallel design, and define relative distances based on Δ and α. Type I error rate and power are investigated after theoretically and empirically deriving the asymptotic sampling distributions of each relative distance. The type I error rate are shown to depend on parameter settings. The theoretical and empirical type I error rate become close to a significance level as the sample size increases. Similar results could be found in the case for power.

We also compare the probability of biosimilarity using ordinal data and the one using dichotomized data to show the significance of assessing biosimilarity with ordinal endpoints. The probabilities of biosimilarity are shown to be different when a remarkable difference between parameters for reference drugs and those for test drugs becomes ambiguous after being dichotomized. This result indicates that careful consideration should precede dichotomizing data in assessing biosimilarity.

This work incorporates a prespecified margin in a three-arm parallel design. A prespecified margin should be statistically and clinically justified; however, they remain controversial. Determining a prespecified margin for evaluating biosimilarity is an important topic for future study. We believe that justified prespecified margin could result in a more accurate assessment of biosimilarity.

There can be more choices for nominal-ordinal association. For example, Piccarreta (2001) proposed new measure of nominal-ordinal association. Comparing the performance of three-arm parallel design using other choices represents an interesting topic for future study. Also, the discussion of our study is restricted to two different batches in calculating the distance between reference drugs. As a future study, our discussion can be extended to (k + 1)-arm parallel design for ordinal endpoints.

Appendix

By the multivariate central limit theorem,

$n 1 [ ( π ^ 1 ∣ R 1 ⋮ π ^ k - 1 ∣ R 1 π ^ 1 ∣ R 2 ⋮ π ^ k - 1 ∣ R 2 π ^ 1 ∣ T ⋮ π ^ k - 1 ∣ T ) - ( π 1 ∣ R 1 ⋮ π k - 1 ∣ R 1 π 1 ∣ R 2 ⋮ π k - 1 ∣ R 2 π 1 ∣ T ⋮ π k - 1 ∣ T ) ] → d N 3 k - 3 [ ( 0 ⋮ ⋮ ⋮ ⋮ ⋮ 0 ) , Σ ] ,$

where

$Σ = [ Σ 1 Σ 2 Σ 3 ] ,$

where

$Σ 1 = [ π 1 ∣ R 1 ( 1 - π 1 ∣ R 1 ) - π 1 ∣ R 1 π 2 ∣ R 1 ⋯ - π 1 ∣ R 1 π k - 1 ∣ R 1 - π 2 ∣ R 1 π 1 ∣ R 1 π 2 ∣ R 1 ( 1 - π 2 ∣ R 1 ) ⋯ - π 2 ∣ R 1 π k - 1 ∣ R 1 ⋮ ⋮ ⋱ ⋮ - π k - 1 ∣ R 1 π 1 ∣ R 1 - π k - 1 ∣ R 1 π 2 ∣ R 1 ⋯ π k - 1 ∣ R 1 ( 1 - π k - 1 ∣ R 1 ) ] , Σ 2 = [ π 1 ∣ R 2 ( 1 - π 1 ∣ R 2 ) - π 1 ∣ R 2 π 2 ∣ R 2 ⋯ - π 1 ∣ R 2 π k - 1 ∣ R 2 - π 2 ∣ R 2 π 1 ∣ R 2 π 2 ∣ R 2 ( 1 - π 2 ∣ R 2 ) ⋯ - π 2 ∣ R 2 π k - 1 ∣ R 2 ⋮ ⋮ ⋱ ⋮ - π k - 1 ∣ R 2 π 1 ∣ R 2 - π k - 1 ∣ R 2 π 2 ∣ R 2 ⋯ π k - 1 ∣ R 2 ( 1 - π k - 1 ∣ R 2 ) ] , Σ 3 = [ π 1 ∣ T ( 1 - π 1 ∣ T ) - π 1 ∣ T π 2 ∣ T ⋯ - π 1 ∣ T π k - 1 ∣ T - π 2 ∣ T π 1 ∣ T π 2 ∣ T ( 1 - π 2 ∣ T ) ⋯ - π 2 ∣ T π k - 1 ∣ T ⋮ ⋮ ⋱ ⋮ - π k - 1 ∣ T π 1 ∣ T - π k - 1 ∣ T π 2 ∣ T ⋯ π k - 1 ∣ T ( 1 - π k - 1 ∣ T ) ] ,$

and $π^j|R1$ , $π^j|R2$, $π^j|T$ denote sample proportions for reference groups from batch 1 and 2, and test group.

### (1) measure Δ

Note that relative distance based on measure Δ can be seen as g1 function of πR1 , πR2 , πT such that

$θ Δ = g 1 ( π R 1 , π R 2 , π T ) = 1 2 ∑ j = 1 k - 1 ( π j ∣ R 1 + π j ∣ R 2 ) ( ∑ q = j + 1 k π q ∣ T ) - 1 2 ∑ j = 1 k - 1 π j ∣ T ( ∑ q = j + 1 k ( π q ∣ R 1 + π q ∣ R 2 ) ) ∑ j = 1 k - 1 π j ∣ R 2 ( ∑ q = j + 1 k π q ∣ R 1 ) - ∑ j = 1 k - 1 π j ∣ R 1 ( ∑ q = j + 1 k π q ∣ R 2 ) ≡ x y ,$

where πj|Ri is cell probability of jth category in group taking reference drug from batch i, i = 1, 2, and πj|T is cell probability of jth category in group taking test drugs. Then, by multivariate delta method, it can be seen that

$n 1 ( θ ^ Δ - θ Δ ) → d N ( 0 , σ Δ 2 ) , σ Δ 2 = B 1 Σ B 1 ′ ,$

where ∑ is the same as (6.2) and B1 = (dg1/dπ1|R1, . . . , dg1/dπk–1|R1 , dg1/dπ1|R2, . . . , dg1/dπk–1|R2 , dg1/dπ1|T , . . . , dg1/dπk–1|RT ).

$d g 1 d π j ∣ R 1 = [ 1 2 ∑ q = j + 1 k π q ∣ T + 1 2 ∑ q = j k - 1 π q ∣ T ] y - x [ ( - 1 ) ∑ q = j k - 1 π q ∣ R 2 - ∑ q = j + 1 k π q ∣ R 2 ] y 2 , d g 1 d π j ∣ R 2 = [ 1 2 ∑ q = j + 1 k π q ∣ T + 1 2 ∑ q = j k - 1 π q ∣ T ] y - x [ ∑ q = j + 1 k π q ∣ R 1 + ∑ q = j k - 1 π q ∣ R 1 ] y 2 , d g 1 d π j ∣ T = - 1 2 ∑ q = j k - 1 ( π q ∣ R 1 + π q ∣ R 2 ) - 1 2 ∑ q = j + 1 k ( π q ∣ R 1 + π q ∣ R 2 ) y .$

### (2) measure log α

Note that θα can be expressed as g2 function of πT, πR1 , and πR2 such that

$θ α = g 2 ( π T , π R 1 , π R 2 ) = log [ ∑ j = 1 k - 1 π j ∣ R 1 + π j ∣ R 2 2 ( ∑ q = j + 1 k π q ∣ T ) ] - log [ ∑ j = 1 k - 1 π j ∣ T ( ∑ q = j + 1 k π q ∣ R 1 + π q ∣ R 2 2 ) ] log [ ∑ j = 1 k - 1 π j ∣ R 2 ( ∑ q = j + 1 k π q ∣ R 1 ) ] - log [ ∑ j = 1 k - 1 π j ∣ R 1 ( ∑ q = j + 1 k π q ∣ R 2 ) ] ≡ x y ,$

where πj|R1, πj|R2 , and πj|T are defined as same as the Section 3.1. Then, it can also be seen by multivariate delta method that

$n 1 ( θ ^ α - θ α ) → d N ( 0 , σ α 2 ) , σ α 2 = B 2 Σ B 2 ′ ,$

where B2 = (dg2/dπ1|R1, . . . , dg2/dπk–1|R1 , dg2/dπ1|R2, . . . , dg2/dπk–1|R2 , dg2/dπ1|T , . . . , dg2/dπk–1|RT ) and ∑ is the same as (6.2). Then, dg2/dπj|R1 = (x′yxy′)/y2, where

$x ′ = 1 2 ∑ q = j + 1 k π q ∣ T [ ∑ m = 1 k - 1 π m ∣ R 1 + π m ∣ R 2 2 ( ∑ p = m + 1 k π p ∣ T ) ] - - 1 2 ∑ q = j k - 1 π q ∣ T [ ∑ m = 1 k - 1 π m ∣ T ( ∑ p = m + 1 k π p ∣ R 1 + π p ∣ R 2 2 ) ] , y ′ = ( - 1 ) ∑ q = j k - 1 π q ∣ R 2 [ ∑ m = 1 k - 1 π m ∣ R 2 ( ∑ p = m + 1 k π p ∣ R 1 ) ] - ∑ q = j + 1 k π q ∣ R 2 [ ∑ m = 1 k - 1 π m ∣ R 1 ( ∑ p = m + 1 k π p ∣ R 2 ) ] .$

Similarly, dg2/dπj|R2 = (x′yxy′)/y2, where

$x ′ = 1 2 ∑ q = j + 1 k π q ∣ T [ ∑ m = 1 k - 1 π m ∣ R 1 + π m ∣ R 2 2 ( ∑ p = m + 1 k π p ∣ T ) ] - - 1 2 ∑ q = j k - 1 π q ∣ T [ ∑ m = 1 k - 1 π m ∣ T ( ∑ p = m + 1 k π p ∣ R 1 + π p ∣ R 2 2 ) ] , y ′ = ∑ q = j + 1 k π q ∣ R 1 [ ∑ m = 1 k - 1 π m ∣ R 2 ( ∑ p = m + 1 k π p ∣ R 1 ) ] - ( - 1 ) ∑ q = j k - 1 π q ∣ R 1 [ ∑ m = 1 k - 1 π m ∣ R 1 ( ∑ p = m + 1 k π p ∣ R 2 ) ] .$

Also, dg2/dπj|T = x′/y, where

$x ′ = - 1 2 ∑ q = j k - 1 ( π q ∣ R 1 + π q ∣ R 2 ) [ ∑ m = 1 k - 1 π m ∣ R 1 + π m ∣ R 2 2 ( ∑ p = m + 1 k π p ∣ T ) ] - 1 2 ∑ q = j + 1 k ( π q ∣ R 1 + π q ∣ R 2 ) [ ∑ m = 1 k - 1 π m ∣ T ( ∑ p = m + 1 k π p ∣ R 1 + π p ∣ R 2 2 ) ] .$
TABLES

### Table 1

Comparison of theoretical and empirical type I error rate (%) based on Δ at significance level α = 5%

π1|R1, . . . , π3|R1 π1|R2, . . . , π3|R2 π1|T , . . . , π3|T θΔ δΔ n1 = 500 n1 = 1000

Theoretical Empirical Theoretical Empirical
(0.3, 0.4, 0.3) (0.2, 0.5, 0.3) (0.1, 0.6, 0.3) −1.50 1.50 3.3 5.7 4.9 5.5
(0.2, 0.55, 0.25) (0.05, 0.7, 0.25) (0.1, 0.6, 0.3) −0.56 0.56 4.4 4.6 4.9 5.0
(0.25, 0.4, 0.35) (0.15, 0.42, 0.43) (0.14, 0.41, 0.45) −0.67 0.67 4.8 6.0 5.0 5.6
(0.25, 0.4, 0.35) (0.15, 0.42, 0.43) (0.16, 0.41, 0.43) −0.45 0.45 1.9 4.3 4.9 5.1
(0.3, 0.3, 0.4) (0.1, 0.5, 0.4) (0.2, 0.3, 0.5) −0.67 0.67 4.8 5.6 5.0 5.6
(0.05, 0.65, 0.3) (0.1, 0.5, 0.4) (0.2, 0.5, 0.3) 2.12 2.12 2.2 8.5 4.9 7.9
(0.3, 0.4, 0.3) (0.4, 0.4, 0.2) (0.3, 0.38, 0.32) 0.59 0.59 7.9 7.6 5.0 6.3
(0.4, 0.4, 0.2) (0.3, 0.4, 0.3) (0.3, 0.4, 0.3) −0.50 0.50 4.2 6.5 5.0 6.0
(0.4, 0.4, 0.2) (0.5, 0.4, 0.1) (0.3, 0.4, 0.3) 1.50 1.50 5.0 9.1 5.0 8.4
(0.4, 0.4, 0.2) (0.1, 0.4, 0.5) (0.3, 0.4, 0.3) 0.17 0.17 4.8 5.1 5.0 5.1
(0.3, 0.4, 0.3) (0.4, 0.4, 0.2) (0.4, 0.38, 0.22) −0.40 0.40 2.0 5.8 4.9 5.6
(0.8, 0.1, 0.1) (0.7, 0.2, 0.1) (0.5, 0.3, 0.2) −2.78 2.78 4.9 11.7 5.0 9.8
(0.7, 0.2, 0.1) (0.4, 0.4, 0.2) (0.5, 0.3, 0.2) −0.21 0.21 4.7 5.3 5.0 5.1
(0.3, 0.4, 0.3) (0.4, 0.4, 0.2) (0.5, 0.38, 0.12) −1.40 1.40 5.0 9.7 5.0 8.7
(0.3, 0.4, 0.3) (0.4, 0.4, 0.2) (0.5, 0.4, 0.1) −1.50 1.50 5.0 9.8 5.0 8.7
(0.4, 0.4, 0.2) (0.5, 0.4, 0.1) (0.6, 0.2, 0.2) −0.71 0.71 5.0 8.5 5.0 7.8
(0.5, 0.3, 0.2) (0.6, 0.2, 0.2) (0.7, 0.1, 0.2) −1.50 1.50 4.4 12.2 5.0 11.3
(0.5, 0.1, 0.4) (0.6, 0.2, 0.2) (0.7, 0.1, 0.2) −0.93 0.93 5.0 8.8 5.0 7.4

### Table 2

Comparison of theoretical and empirical type I error rate (%) based on log α at significance level α = 5%

π1|R1, . . . , π3|R1 π1|R2, . . . , π3|R2 π1|T , . . . , π3|T θα δα n1 = 500 n1 = 1000

Theoretical Empirical Theoretical Empirical
(0.3, 0.4, 0.3) (0.2, 0.5, 0.3) (0.1, 0.6, 0.3) −1.50 1.50 3.2 12.7 4.9 10.5
(0.2, 0.55, 0.25) (0.05, 0.7, 0.25) (0.1, 0.6, 0.3) −0.55 0.55 4.4 6.4 5.0 6.3
(0.25, 0.4, 0.35) (0.15, 0.42, 0.43) (0.14, 0.41, 0.45) −0.68 0.68 4.8 7.9 5.0 7.5
(0.25, 0.4, 0.35) (0.15, 0.42, 0.43) (0.16, 0.41, 0.43) −0.45 0.45 1.6 6.0 4.9 6.1
(0.3, 0.3, 0.4) (0.1, 0.5, 0.4) (0.2, 0.3, 0.5) −0.68 0.68 4.8 7.9 5.0 7.1
(0.05, 0.65, 0.3) (0.1, 0.5, 0.4) (0.2, 0.5, 0.3) 1.99 1.99 2.1 13.2 4.9 10.9
(0.3, 0.4, 0.3) (0.4, 0.4, 0.2) (0.3, 0.38, 0.32) 0.58 0.58 4.8 7.6 5.0 6.8
(0.4, 0.4, 0.2) (0.3, 0.4, 0.3) (0.3, 0.4, 0.3) −0.49 0.49 4.2 7.2 5.0 6.4
(0.4, 0.4, 0.2) (0.5, 0.4, 0.1) (0.3, 0.4, 0.3) 1.43 1.43 5.0 9.8 5.0 7.6
(0.4, 0.4, 0.2) (0.1, 0.4, 0.5) (0.3, 0.4, 0.3) 0.15 0.15 4.8 5.3 5.0 5.1
(0.3, 0.4, 0.3) (0.4, 0.4, 0.2) (0.4, 0.38, 0.22) −0.40 0.40 1.7 4.9 4.9 5.8
(0.8, 0.1, 0.1) (0.7, 0.2, 0.1) (0.5, 0.3, 0.2) −2.15 2.15 5.0 10.7 5.0 9.2
(0.7, 0.2, 0.1) (0.4, 0.4, 0.2) (0.5, 0.3, 0.2) −0.20 0.20 4.7 4.5 5.0 5.0
(0.3, 0.4, 0.3) (0.4, 0.4, 0.2) (0.5, 0.38, 0.12) −1.46 −1.46 5.0 8.9 5.0 7.7
(0.3, 0.4, 0.3) (0.4, 0.4, 0.2) (0.5, 0.4, 0.1) −1.58 1.58 5.0 8.7 5.0 7.6
(0.4, 0.4, 0.2) (0.5, 0.4, 0.1) (0.6, 0.2, 0.2) −0.70 0.70 4.9 5.0 5.0 6.5
(0.5, 0.3, 0.2) (0.6, 0.2, 0.2) (0.7, 0.1, 0.2) −1.65 1.65 4.3 11.2 5.0 9.8
(0.5, 0.1, 0.4) (0.6, 0.2, 0.2) (0.7, 0.1, 0.2) −1.04 1.04 5.0 7.8 5.0 6.9

### Table 3

Comparison of theoretical and empirical power (%) based on Δ at significance level α = 5%

π1|R1, . . . , π3|R1 π1|R2, . . . , π3|R2 π1|T , . . . , π3|T θΔ δΔ n1 = 500 n1 = 1000

Theoretical Empirical Theoretical Empirical
(0.3, 0.4, 0.3) (0.2, 0.5, 0.3) (0.1, 0.6, 0.3) −1.50 3.5 81.0 62.3 97.3 76.1
(0.2, 0.55, 0.25) (0.05, 0.7, 0.25) (0.1, 0.6, 0.3) −0.56 1.5 97.6 85.2 99.9 97.2
(0.25, 0.4, 0.35) (0.15, 0.42, 0.43) (0.14, 0.41, 0.45) −0.67 1.7 97.6 93.6 99.9 99.6
(0.25, 0.4, 0.35) (0.15, 0.42, 0.43) (0.16, 0.41, 0.43) −0.45 1.3 95.9 81.5 99.9 95.6
(0.3, 0.3, 0.4) (0.1, 0.5, 0.4) (0.2, 0.3, 0.5) −0.67 1.4 84.3 71.2 98.2 87.6
(0.05, 0.65, 0.3) (0.1, 0.5, 0.4) (0.2, 0.5, 0.3) 2.12 5.0 78.0 60.3 96.2 74.2
(0.3, 0.4, 0.3) (0.4, 0.4, 0.2) (0.3, 0.38, 0.32) 0.59 1.3 85.2 73.1 98.4 91.3
(0.4, 0.4, 0.2) (0.3, 0.4, 0.3) (0.3, 0.4, 0.3) −0.50 1.2 87.6 76.1 98.9 91.8
(0.4, 0.4, 0.2) (0.5, 0.4, 0.1) (0.3, 0.4, 0.3) 1.50 2.5 75.1 64.3 94.9 83.2
(0.4, 0.4, 0.2) (0.1, 0.4, 0.5) (0.3, 0.4, 0.3) 0.17 0.4 95.8 92.9 99.9 99.7
(0.3, 0.4, 0.3) (0.4, 0.4, 0.2) (0.4, 0.38, 0.22) −0.40 1.2 95.6 83.9 99.9 96.4
(0.8, 0.1, 0.1) (0.7, 0.2, 0.1) (0.5, 0.3, 0.2) −2.78 4.8 70.0 60.4 92.2 77.6
(0.7, 0.2, 0.1) (0.4, 0.4, 0.2) (0.5, 0.3, 0.2) −0.21 0.5 88.0 88.4 99.0 98.7
(0.3, 0.4, 0.3) (0.4, 0.4, 0.2) (0.5, 0.38, 0.12) −1.40 2.4 80.8 67.7 97.2 84.2
(0.3, 0.4, 0.3) (0.4, 0.4, 0.2) (0.5, 0.4, 0.1) −1.50 2.5 78.2 65.6 96.2 82.1
(0.4, 0.4, 0.2) (0.5, 0.4, 0.1) (0.6, 0.2, 0.2) −0.71 1.4 78.5 70.0 96.3 87.8
(0.5, 0.3, 0.2) (0.6, 0.2, 0.2) (0.7, 0.1, 0.2) −1.50 3.0 65.5 58.2 89.3 71.5
(0.5, 0.1, 0.4) (0.6, 0.2, 0.2) (0.7, 0.1, 0.2) −0.93 1.7 92.8 80.7 99.7 94.6

### Table 4

Comparison of theoretical and empirical power (%) based on log α at significance level α = 5%

π1|R1, . . . , π3|R1 π1|R2, . . . , π3|R2 π1|T , . . . , π3|T θα δα n1 = 500 n1 = 1000

Theoretical Empirical Theoretical Empirical
(0.3, 0.4, 0.3) (0.2, 0.5, 0.3) (0.1, 0.6, 0.3) −1.50 3.5 72.4 59.2 93.6 72.4
(0.2, 0.55, 0.25) (0.05, 0.7, 0.25) (0.1, 0.6, 0.3) −0.55 1.5 97.1 83.5 99.9 95.9
(0.25, 0.4, 0.35) (0.15, 0.42, 0.43) (0.14, 0.41, 0.45) −0.68 1.7 96.0 93.3 99.9 99.4
(0.25, 0.4, 0.35) (0.15, 0.42, 0.43) (0.16, 0.41, 0.43) −0.45 1.3 94.2 81.3 99.8 95.2
(0.3, 0.3, 0.4) (0.1, 0.5, 0.4) (0.2, 0.3, 0.5) −0.68 1.4 76.7 66.7 95.6 85.0
(0.05, 0.65, 0.3) (0.1, 0.5, 0.4) (0.2, 0.5, 0.3) 1.99 5.0 84.6 61.6 98.3 74.8
(0.3, 0.4, 0.3) (0.4, 0.4, 0.2) (0.3, 0.38, 0.32) 0.58 1.3 87.6 75.1 98.9 91.0
(0.4, 0.4, 0.2) (0.3, 0.4, 0.3) (0.3, 0.4, 0.3) −0.49 1.2 89.6 76.2 99.3 92.8
(0.4, 0.4, 0.2) (0.5, 0.4, 0.1) (0.3, 0.4, 0.3) 1.43 2.5 83.5 68.8 98.0 87.3
(0.4, 0.4, 0.2) (0.1, 0.4, 0.5) (0.3, 0.4, 0.3) 0.15 0.4 98.3 96.7 99.9 99.8
(0.3, 0.4, 0.3) (0.4, 0.4, 0.2) (0.4, 0.38, 0.22) −0.40 1.2 96.1 84.1 99.9 97.0
(0.8, 0.1, 0.1) (0.7, 0.2, 0.1) (0.5, 0.3, 0.2) −2.15 4.8 98.5 79.2 99.9 92.1
(0.7, 0.2, 0.1) (0.4, 0.4, 0.2) (0.5, 0.3, 0.2) −0.20 0.5 96.7 94.2 99.9 99.6
(0.3, 0.4, 0.3) (0.4, 0.4, 0.2) (0.5, 0.38, 0.12) −1.46 2.4 71.7 62.3 93.2 81.0
(0.3, 0.4, 0.3) (0.4, 0.4, 0.2) (0.5, 0.4, 0.1) −1.58 2.5 66.5 57.9 90.0 76.3
(0.4, 0.4, 0.2) (0.5, 0.4, 0.1) (0.6, 0.2, 0.2) −0.70 1.4 81.7 70.6 97.5 88.1
(0.5, 0.3, 0.2) (0.6, 0.2, 0.2) (0.7, 0.1, 0.2) −1.65 3.0 51.2 50.8 76.6 64.6
(0.5, 0.1, 0.4) (0.6, 0.2, 0.2) (0.7, 0.1, 0.2) −1.04 1.7 75.5 65.5 95.1 84.8

### Table 5

Comparison of po and pb based on Δ and risk difference at significance level α = 5%

π1|R1, . . . , π4|R1 π1|R2, . . . , π4|R2 π1|T , . . . , π4|T $π 1 * ∣ R 1 , π 2 * ∣ R 1$ $π 1 * ∣ R 2 , π 2 * ∣ R 2$ $π 1 * ∣ T , π 2 * ∣ T$ δΔ Po Pb
(0.2, 0.35, 0.35, 0.1) (0.3, 0.3, 0.3, 0.1) (0.5, 0.15, 0.2, 0.15) (0.55, 0.45) (0.6, 0.4) (0.65, 0.35) 2.0 0.146 0.256
(0.3, 0.2, 0.25, 0.25) (0.1, 0.4, 0.4, 0.1) (0.18, 0.22, 0.25, 0.35) (0.5, 0.5) (0.5, 0.5) (0.4, 0.6) 2.0 0.005 0.015
(0.2, 0.3, 0.3, 0.2) (0.2, 0.4, 0.3, 0.1) (0.6, 0.2, 0.1, 0.1) (0.4, 0.6) (0.6, 0.4) (0.8, 0.2) 2.0 0.000 0.022
(0.2, 0.3, 0.3, 0.2) (0.3, 0.3, 0.3, 0.1) (0.1, 0.1, 0.1, 0.7) (0.5, 0.5) (0.6, 0.4) (0.2, 0.8) 5.0 0.000 0.002
(0.4, 0.4, 0.2, 0.2) (0.3, 0.5, 0.1, 0.1) (0.1, 0.4, 0.1, 0.4) (0.8, 0.2) (0.8, 0.2) (0.5, 0.5) 5.0 0.000 0.224
(0.6, 0.1, 0.2, 0.1) (0.5, 0.1, 0.2, 0.2) (0.2, 0.4, 0.3, 0.1) (0.7, 0.3) (0.6, 0.4) (0.6, 0.4) 0.5 0.225 0.896
(0.2, 0.2, 0.25, 0.35) (0.22, 0.25, 0.2, 0.33) (0.25, 0.1, 0.05, 0.6) (0.4, 0.6) (0.47, 0.53) (0.35, 0.65) 0.5 0.032 0.428
(0.5, 0.2, 0.15, 0.15) (0.6, 0.25, 0.1, 0.05) (0.25, 0.3, 0.4, 0.05) (0.7, 0.3) (0.85, 0.15) (0.55, 0.45) 5.0 0.149 0.460
(0.05, 0.05, 0.4, 0.5) (0.1, 0.25, 0.3, 0.35) (0.4, 0.2, 0.3, 0.1) (0.1, 0.9) (0.35, 0.65) (0.6, 0.4) 1.0 0.071 0.726

### Table 6

Comparison of po and pb based on log α and log odds ratio at α = 5%

π1|R1, . . . , π4|R1 π1|R2, . . . , π4|R2 π1|T , . . . , π4|T $π 1 * ∣ R 1 , π 2 * ∣ R 1$ $π 1 * ∣ R 2 , π 2 * ∣ R 2$ $π 1 * ∣ T , π 2 * ∣ T$ δα Po Pb
(0.2, 0.35, 0.35, 0.1) (0.3, 0.3, 0.3, 0.1) (0.5, 0.15, 0.2, 0.15) (0.55, 0.45) (0.6, 0.4) (0.65, 0.35) 2.0 0.159 0.259
(0.3, 0.2, 0.25, 0.25) (0.1, 0.4, 0.4, 0.1) (0.18, 0.22, 0.25, 0.35) (0.5, 0.5) (0.5, 0.5) (0.4, 0.6) 2.0 0.004 0.014
(0.2, 0.3, 0.3, 0.2) (0.2, 0.4, 0.3, 0.1) (0.6, 0.2, 0.1, 0.1) (0.4, 0.6) (0.6, 0.4) (0.8, 0.2) 2.0 0.000 0.005
(0.2, 0.3, 0.3, 0.2) (0.3, 0.3, 0.3, 0.1) (0.1, 0.1, 0.1, 0.7) (0.5, 0.5) (0.6, 0.4) (0.2, 0.8) 5.0 0.000 0.000
(0.4, 0.4, 0.2, 0.2) (0.3, 0.5, 0.1, 0.1) (0.1, 0.4, 0.1, 0.4) (0.8, 0.2) (0.8, 0.2) (0.5, 0.5) 5.0 0.000 0.459
(0.6, 0.1, 0.2, 0.1) (0.5, 0.1, 0.2, 0.2) (0.2, 0.4, 0.3, 0.1) (0.7, 0.3) (0.6, 0.4) (0.6, 0.4) 0.5 0.442 0.911
(0.2, 0.2, 0.25, 0.35) (0.22, 0.25, 0.2, 0.33) (0.25, 0.1, 0.05, 0.6) (0.4, 0.6) (0.47, 0.53) (0.35, 0.65) 0.5 0.015 0.392
(0.5, 0.2, 0.15, 0.15) (0.6, 0.25, 0.1, 0.05) (0.25, 0.3, 0.4, 0.05) (0.7, 0.3) (0.85, 0.15) (0.55, 0.45) 5.0 0.244 0.814
(0.05, 0.05, 0.4, 0.5) (0.1, 0.25, 0.3, 0.35) (0.4, 0.2, 0.3, 0.1) (0.1, 0.9) (0.35, 0.65) (0.6, 0.4) 1.0 0.088 0.996

References
1. Agresti A (1981). Measures of nominal-ordinal association, Journal of the American Statistical Association, 76, 524-529.
2. Chen C, Tsou H, Hsiao C, Lai Y, Chang W, and Liu J (2017). A tolerance interval approach to assessing the biosimilarity of follow-on biologics, Statistics in Biopharmaceutical Research, 9, 286-292.
3. Doll R and Pygott F (1952). Factors influencing the rate of healing of gastric ulcers admission to hospital, phenobarbitone, and ascorbic acid, Lancet, 259, 171-175.
4. Kang SH and Chow SC (2013). Statistical assessment of biosimilarity based on relative distance between follow-on biologics, Statistics in Medicine, 32, 382-392.
5. Kang SH and Kim Y (2014). Sample size calculations for the development of biosimilar products, Journal of Biopharmaceutical Statistics, 24, 1215-1224.
6. Kang SH, Jung J, and Baik S (2015). Sample size calculations for the development of biosimilar products based on binary endpoints, Communications for Statistical Applications and Methods, 22, 389-399.
7. Kang SH and Shin W (2015). Statistical assesment of biosimilarity based on the relative distance between follow-on biologics in the (k + 1)-arm parallel design, Communications for Statistical Applications and Methods, 22, 605-613.
8. Lawless JF (1982). Statistical Models and Methods for Lifetime Data, Wiley, New York.
9. Lu Y, Zhang Z, and Chow SC (2014). Frequency estimator for assessing of follow-on biologics, Journal of Biopharmaceutical Statistics, 24, 1280-1297.
10. Piccarreta R (2001). A new measure of nominal-ordinal association, Journal of Applied Statistics, 28, 107-120.
11. Roozenbeek B, Lingsma H, Perel P, Edwards P, Roberts I, Murray G, Maas A, and Steyerberg E (2011). The added value of ordinal analysis in clinical trials: an example in traumatic brain injury, Critical Care, 15, R127.
12. Shin W and Kang SH (2016). Statistical assessment of biosimilarity based on the relative distance between follow-on biologics for binary endpoints, Journal of Biopharmaceutical Statistics, 26, 227-239.
13. Sankey S and Weissfeld L (1998). A study of the effect of dichotomizing ordinal data upon modeling, Communications in Statistics-Simulation and Computation, 27, 871-887.
14. Scientific consideration in demonstrating biosimilarity to a reference product: guidance for industry. Retrieved April, 2015, from: https://www.fda.gov/media/82647/download
15. Vulto A and Jaquez O (2017). The process defines the product: what really matters in biosimilar design and production?, Rheumatology, 56, iv14-29.
16. Yang J, Zhang N, Chow SC, and Chi E (2012). An adapted F-test for homogeneity of variability in follow-on biological products, Statistics in Medicine, 32, 415-423.
17. Zhang N, Yang J, Chow SC, and Chi E (2014). Nonparametric tests for evaluation of biosimilarity in variability of follow-on biologics, Journal of Biopharmaceutical Statistics, 24, 1239-1253.