TEXT SIZE

CrossRef (0)
Extension of the Mantel-Haenszel test to bivariate interval censored data

Dong-Hyun Leea, Yang-Jin Kim1,a

aDepartment of Statistics, Sookmyung Women’s University, Korea
Correspondence to: 1 Department of Statistics, SookmyungWomen’s University, Yongsan-Gu, Cheongpa-ro 47-gil 100(Cheongpa-dong 2ga), Seoul 04310, Korea. E-mail: yjin@sookmyung.ac.kr

This work was supported by a Korea Research Grant (NRF-2020R1A2C1A01100755).
Received November 18, 2021; Revised March 20, 2022; Accepted April 24, 2022.
Abstract
This article presents an independence test between pairs of interval censored failure times. The Mantel-Haenszel test is commonly applied to test the independence between two categorical variables accompanied with a strata variable. Hsu and Prentice (1996) applied a Mantel-Haenszel test to the sequence of 2 × 2 tables formed at the grids which are composed of failure times. In this article, due to unknown failure times, the suitable grid points should be determined and the status of failure and at risk are estimated at those grid points. We also consider a weighted test statistic to bring a more powerful test. Simulation studies are performed to evaluate the power of test statistics under finite samples. The method is applied to analyze two real data sets, mastitis data from milk cows and an age-related eye disease study.
Keywords : bivariate interval censored data, Clayton model, Gumbel model, independence test, Mantel-Haenszel test
1. Introduction

Interval censored data commonly arise in observational studies where the exact occurrence time of the event is unobservable but two inspection times including it are available. When (Ti1, Ti2) is a pair of failure times from the ith subject, a bivariate interval censored data is composed of two pairs of inspection times {(Li1, Ri1] × (Li2, Ri2], i = 1, . . . , n}, where the inspection times satisfy Lik < Tik < Rik(k = 1, 2) and are assumed to be independent of the failure times.

For right censored data, several approaches for testing the independence have been proposed. Oakes (1982) developed a concordance test based on the estimated Kendall’s τ and Shih and Louis (1996) constructed a test based on the martingale residual and considered its extension by implementing several weights to reflect a time-varying association. Hsu and Prentice (1996) suggested a generalization of the Mantel-Haenszel test by combining the sequence of 2 × 2 tables formed at the grids on a plane where the grids are composed of a pair of observed failure times.

These above approaches have been extended to test the independence of bivariate interval censored data. Betensky and Finkelstein (1999) applied a multiple imputation technique for missing failure times and then implemented the method Oakes (1982) proposed. In their approach, simulated bivariate failure times are generated from the estimated joint survival distribution and are utilized to calculate Kendall’s τ. Sun et al. (2006) estimated the association measure of a Clayton copula by extending Wang and Ding (2000)’s two-stage procedures. As a parametric approach, Bogaerts and Lesaffre (2008) assumed a bivariate log-normal distribution for (T1, T2) and derived a two stage procedure to calculate a local measure such as a cross ratio function. For testing the association, Ding and Wang (2004) constructed test statistics for bivariate current status data, which is sometimes denoted as interval censoring case I by extending the Hsu and Prentice (1996)’s method. In this article, we extend the Mantel-Haenszel test to bivariate interval censored data by constructing dependent 2 × 2 tables formed at the emulated grid points and by assigning the estimated number of failures and that of at-risk to the cell of each table. This article is organized as follows. In Section 2, we introduce the model and develop the estimation procedure. Simulation and data analysis are given in Section 3 and 4, respectively. A discussion about the suggested method is provided in Section 5.

2. Independence test for bivariate interval censored data

### 2.1. Preliminary

In this paper, our interest is to develop a nonparametric procedure for testing the independence of bivariate interval censored data. Denote F(t1, t2) = Pr(T1t1, T2t2), F1(t1) = Pr(T1t1) and F2(t2) = Pr(T2t2) as the joint distribution and marginal distributions of (T1, T2), respectively. When H0 : T1T2 is true, then F(t1, t2) = F1(t2)F2(t2) is valid. For bivariate right censored data, {(Xi1, δi1), (Xi2, δi2), i = 1, . . . , n} denotes observation times and censoring indicators, respectively, where Xik = min(Tik, Cik) for failure time Tik and the censoring time Cik and δik = I(TikCik), k = 1, 2. The grid points are formatted using the observed failure times Xik satisfying δik = 1 and composed of ν11 < ν12 < · · · < ν1m1 for the first event and ν21 < ν22 < · · · < ν2m2 for the second one, respectively. Thus, the information from m1 × m2’s tables are evaluated. For each grid point (ν1r, ν2l), r = 1, . . . , m1, l = 1, . . . , m2}, T(ν1r, ν2l) denotes a 2 × 2 table with a value (0,1) representing a non-failure and a failure, respectively. Then, the four cells at T(ν1r, ν2l) are denoted as (D00(ν1r, ν2l), D01(ν1r, ν2l), D10(ν1r, ν2l), D11(ν1r, ν2l)), where,

Dab(v1r,v2l)=i=1nYi1(ν1r)Yi2(ν2l)I(dNi1(ν1r)=a,dNi2(ν2l)=b),         a=0,1;b=0,1,

and where Yik(s) = I(Xiks), k = 1, 2 denotes a risk process and Nik(s) = I(Tik < s, δik = 1) denotes a counting process with dNik = Nik(s) – Nik(s−), respectively. Then the margins are defined as D+b(ν1r, ν2l) = D0b(ν1r, ν2l) + D1b(ν1r, ν2l) and Da+(ν1r, ν2l) = Da0(ν1r, ν2l) + Da1(ν1r, ν2l). For example, D1+(ν1r) denotes the total number of the first event at ν1r and D+1(ν2r) denotes the total number of the second event at ν2r, respectively. Let rl = D00(ν1r, ν2l) + D01(ν1r, ν2l) + D10(ν1r, ν2l) + D11(ν1r, ν2l) be the number of pairs at risk at (ν1r, ν2l). Conditional on the margins, D11(ν1r, ν2l) follows a hypergeometric distribution with the following conditional expectation

E11(ν1r,ν2l)=D1+(ν1r,ν2l)D+1(ν1r,ν2l)D¯rl.

If all tables are independent, the summation of the differences between observed and expected numbers would follow an asymptotic normal distribution. Since all the tables are formed at failure times and are related through the risk processes, they are not independent, and thus, the sum does not have a same asymptotic property as that of an ordinary Mantel-Haenszel test statistic. Hsu and Prentice (1996) rewrote the Mantel-Haenszel test statistic in terms of a counting process in order to derive the large sample properties based on the martingale theory,

S˜=r=1m1l=1m2(D11(ν1r,ν2l)-D+1(ν1r)D1+(ν2l)D¯rl)=i=1n0τ10τ2(Ni1(dt1)Ni2(dt2)-1nR¯(t1,t2)N·1(dt1)N·2(dt2))=i=1n0τ10τ2(M^i1(dt1)M^i2(dt2)-1nR¯(t1,t2)M^·1(dt1)M^·2(dt2)),

where τk denotes the maximum follow-up time of a type k event. R¯(t1,t2)=Σi=1nI(Xi1t1,Xi2t2) denotes the total number of at-risk subjects at (t1, t2) and a martingale ik(dt) is defined as ik(dt) = Nik(dt) – Yik Λ̂;k(dt) with the Nelson-Aalen estimate Λ̂k(t) = ∑Xjkt dNjk(t)/Yjk(t) where Yjk(t)=Σl=1nkI(Xlkt).

### 2.2. Suggested method

The first step to apply the Mantel-Haenszel test to the interval censored data is to format suitable grid points due to unknown exact failure times. In our approach, grid points are composed of the pseudo failure times derived from the interval censored data, {(Li1, Ri1, Li2, Ri2), i = 1, . . . , n}. These pseudo failure times {sk1 < sk2 < · · · < skmk, k = 1, 2} are constructed from the equivalence set {(llk,rlk)l=1mk} of the univariate interval censored data (Sun, 2006) and defined as skl=(llk+rlk)/2.

The second step is to define the test statistic at the grid points by extending (2.1),

S=r=1m1l=1m2(D11(s1r,s2l)-E11(s1r,s2l)),

where D11(s1r, s2l) denotes the number of pairs with both failures at (s1r, s2l), and E11(s1r, s2l) = E(D11(s1r, s2l)|H0) represents a conditional expectation of D11 under H0. Here E11(s1r, s2l) is estimated as R(s1r, s2l)λ1(sir)λ2(s2l), where R(s1r, s2l) represents the risk set, λ1(s1r) and λ2(s2l) marginal hazards, respectively. However, D11(s1r, s2l) and E11(s1r, s2l) are unknown under the interval censored data and calculated by,

D^11(s1r,s2l)=i=1nαirlfrlΣu=1m1Σv=1m2αiuvfuv=i=1nf˜irl,E^11(s1=r,s2l)=rrlλ^1(s1l)λ^2(s2l),

where αirl = I(Li1 < s1rRi1, Li2 < s2lRi2) and frl = Pr(T1 = s1r, T2 = s2l) is the joint probability mass function. rrl = (s1r, s2l) denotes the estimated risk set size at (s1r, s2l) and is obtained by,

rrl=i=1nrum1lvm2f˜iuv.

Let gr = Pr(T1 = s1r) and hl = Pr(T2 = s2l) be the marginal probability mass functions. Denote βir = I(Li1 < s1rRi1) and γil = I(Li2 < s2lRi2). Then λ1 and λ2 are estimated by

λ^1(s1r)=D^1(s1r)Y1(s1r),         λ^2(s2r)=D^2(s2l)Y2(s2l),

where for the first event

D^1(s1r)=i=1nβirgrΣu=1m1βiugu=i=1ng˜ir,         Y1(s1r)=i=1nrum1g˜iu

and for the second event,

D^2(s2l)=i=1nγilhlΣv=1m2βivhv=i=1nh˜il,         Y1(s2l)=i=1nlvm2h˜iv.

Finally, the joint probability and marginal probabilities defined at (2.3), (2.4) and (2.5) are estimated by a well-known self-consistency algorithm (Sun, 2006; Maathuis, 2013). The likelihood function

L(f)=i=1nr=1l=1αirlfrl,

where the NPMLE of f is determined by maximizing the L( f ) subject to frl > 0 and ∑rl frl = 1. After obtaining rl, the marginal probability mass functions are defined as ĝr = ∑lrl and ĥl = ∑rrl, respectively.

The explicit estimation for the variance of test statistic (2.2) is technically complicated since it is related with the complexity of the nonparametric maximum likelihood estimators (NPMLEs) of fkl, gk and hl. Therefore, the bootstrap method is adopted to obtain the variance estimation in this article. In detail, we generated a bootstrap sample {(Li1*,Ri1*,Li2*,Ri2*,δi1*,δi2*), i = 1, . . . , n} from the original sample. The procedure is repeated B times and brings S*b=Σk=1m1,b*Σl=1m2,b*(D11*(s1k,s2l)-E11*(s1k,s2l)), b = 1, . . . , B as the bootstrap statistics for (2.2). Then, the variance was calculated by the sample variance of S*b, b = 1, . . . , B. That is,

σ2*=b=1B(S*b-S¯*)2B-1,         S¯*=b=1BS*bB.

The resulting test statistic Q*=S/σ2* is assumed to follow a standard normal distribution under the null hypothesis. Here B = 200 is implemented.

In order to improve the power of a test, the above test statistics can be generalized by implementing a weight function Ŵ (t1, t2),

SW=k=1m1l=1m2W^(s1k,s2l)(D11(s1r,s2l)-E11(s1r,s2l)),

where the weight function can be chosen to emphasize the difference at the specific regions during follow-up times. For example, Ŵ (t1, t2) = {Ŝ1(t)Ŝ2(t2)}ρ{(1–Ŝ1(t1))(1–Ŝ2(t2)}λ, where the Gρ,λ class was introduced by Harrington and Fleming (1982).

3. Simulation study

We performed a series of simulations to evaluate the finite sample performance of the proposed tests. Bivariate failure times (T1, T2) are generated from the Clayton family and the Gumbel family using R package CopulaCenR (Sun and Ding, 2019). (i) A Clayton family is known as a gamma frailty model, where the cross ratio θ(t1, t2) = λ1(t1|T2 = t2)/λ1(t1|T2 > t2) = λ2(t2|T1 = t1)/λ2(t2|T1 > t1) is constant over the time and (ii) a Gumbel family has a decreasing cross ratio with increasing t1 or t2.

We generated 10’s random numbers h1 < · · · < h10 from the uniform distribution U(0, q) to construct the inspection times satisfying Lk = hl–1 < Tk < hl = Rk. Here, q is determined to set a given censoring rate. Two cases were considered, no right censoring (0%) and 40% right censoring on each margin. We calculated test statistics and corresponding variance in order to evaluate the performance by the empirical power, which is defined as the relative frequency by which the test rejects the null hypothesis at the 0.05 nominal level based on 500 replicates. The power pattern was investigated under the combination of nine τ values ranging from 0 to 0.4 and three sample sizes (n = 50, 100, 200). The results of two test statistics (unweighted and weighted test (W(t1, t2) = S1(t1)S2(t2))) are presented in Table 1 (Clayton family) and in Table 2 (Gumbel family). Figure 1 shows the Q-Q plot of test statistics under (i) unweighted (ii) weighted versions with W to check the normality under n = 100 and 40% censoring and those plots show they satisfy the normal distribution.

According to the result of Table 1 (Clayton family with constant cross hazard), the type I errors of the unweighted test are very close to the 0.05 nominal level. Type I errors of the weighted test are smaller than the nominal size at small sample sizes but have similar values with the nominal value as the sample size increases.

The powers of both test statistics increase as the association and sample size become greater. Under the 40% censoring rate, two test statistics show smaller powers compared with the no censoring case. However, the weighted test statistic shows a higher power than the unweighted one as the association gets stronger and the sample size increases.

Table 2 shows the result related with the Gumbel family and indicates an overestimated type I error with a small sample size and no censoring. In particular, the weighted test statistic gives a remarkable power improvement even when the censoring rate is large. Although the sample size increases, the unweighted test statistic does not show a significant improvement of power under heavy censoring.

4. Data analysis

### 4.1. Mastitis data

Mastitis in dairy cattle is an inflammation of the udder and reduces the milk production and the quality of the milk (Bogaert et al., 2018). In the analysis, 100 cows were screened monthly at the udder quarter level for bacteria infection. Monthly visits were independently performed and results in interval-censored data. When there no infection occurs before the end of study, it is regarded as right-censored. Denote T1, T2, T3 and T4 as the infection times of the rear and front udders at side 1 and those of the rear and front udders at side 2, respectively. Right censoring rates are 19, 20, 22 and 22%, respectively. Table 3 shows the result of three test statistics; unweighted S, S w1 with W(s, t) = S1(s)S2(t) and S w2 with W(s, t) = (1 – S1(s))(1 – S2(t)) at (2.6). According to the results, the unweighted test statistic S and the weighted test statistic S w2 with weight emphasizing the latter association yield insignificant result while the weighted test statistic S w1 with weight emphasizing the early dependency yields a significant result. Figure 2 shows the rectangle plot of (T1, T2) from use of MLEcens package (2013) and indicates most rectangles posit at the left lower part, which means the S w1 focusing on early association has a more significant p-value.

### 4.2. AREDS data

For the second real data, an age-related eye disease study (AREDS) was analyzed (Sun and Ding, 2019). Our interest is to investigate the association of the time to late age-related molecular disease (AMD) between two eyes. The dataset includes 629 Caucasian patients who had at least one eye in moderate AMD stage at baseline. Each patient was examined every six months in the first six year and then every year after year 6. For the j (= 1, 2)th eye of the subject i, Li j denotes the last assessment time when the eye was free from AMD and Ri j denotes the first assessment time when the eye was diagnosed with AMD. The censoring rate is 46.7% and 44.5% of the left and right eyes, respectively. For investigating the association between the two eyes, the unweighted test statistics yielded S = 60.83 with a corresponding standard error se(S ) = 7.08 and p-value < 0.001, which means the two eyes were correlated in term of the progression of the disease. When two weighted test statistics are applied, (i) S w1 = 45.668, sew1 = 3.780 and p-value < 0.001, and the (ii) sew2 = 2.857, sew2 = 0.706 and p-value < 0.001, respectively. Figure 3 shows the rectangles of the bivariate interval censored late AMD time for between two eyes. Compared with the mastitis data set in Figure 2, the rectangle is uniformly displayed.

5. Concluding remarks

We have suggested a generalized Mantel-Haenszel test to test the independence between two interval censored failure times. The proposed test procedures are based on the pseudo occurrence times derived from the equivalence set and the corresponding estimated occurrence numbers and the risk set sizes under H0. The variance of the suggested test statistic was constructed using a bootstrap technique. The weighted test statistics give a somewhat different result according to the censoring rate, sample size and association pattern. Simulation results show that the test with more weights at the early time region seems to have desirable powers even at high censoring rates. Considering the relation between censoring and weight, we analyzed the mastitis data, which shows a very different result between the two different weighted and unweighted version. Therefore, the application of a suitable weight is very essential procedure for testing the independence of bivariate failure times. In our data analysis, in Figure 2, most rectangles are located at the lower right part where S w1 yields significant p-values while the other test statistics are insignificant. Meanwhile, Figure 3 shows uniformly distributed rectangles and these rectangle plots would be a good indicator to select suitable weights. As for related studies, the weighted Mantel-Haenszel test that combines the subject-specific covariate can be considered. Alternatively, the copula based score test reflecting the subject-specific covariate is suggested and is extended to the bivariate case (Sun et al., 2019; Sun and Ding, 2021).

Figures
Fig. 1. Q-Q plot of test statistic (unweighted and weighted) at n = 100.
Fig. 2. Rectangular plot of inflammation times at two udders.
Fig. 3. Rectangular plot of AMD occurrence times of two eyes.
TABLES

### Table 1

Empirical powers of Unweighted (U) and Weighted (W) test under Clayton model

nCensoringWτ : Association

00.050.100.150.200.250.300.350.40
500%U0.0500.0860.2020.2840.4200.5520.6400.7850.830
W0.0400.0620.1300.2700.4320.6320.7970.8680.936

40%U0.0440.0880.1240.1470.2560.3310.5220.6160.774
W0.0380.0860.1200.2380.3280.5040.7020.8240.918

1000%U0.0580.1240.2830.4880.7660.9000.9800.9860.997
W0.0560.1080.2860.5100.7740.9381.0001.0001.000

40%U0.0540.1100.1420.2880.4460.5300.7230.9120.996
W0.0500.1020.2000.3680.6360.8160.9320.9740.998

2000%U0.0470.1980.4000.7480.9161.0001.0001.0001.000
W0.0560.1880.4940.8660.9861.0001.0001.0001.000

40%U0.0600.1100.2500.4700.7140.9100.9740.9951.000
W0.0580.1320.3620.6880.9160.9841.0001.0001.000

### Table 2

Empirical powers of Unweighted (U) and Weighted (W) test under Gumbel model

nCensoringWτ : Association

00.050.100.150.200.250.300.350.40
500%U0.0690.0980.0980.2040.2440.3460.4680.5840.656
W0.0440.0860.0920.1480.2720.3720.5360.6440.726

40%U0.0560.0580.0560.0920.1080.1400.1840.2600.330
W0.0460.0720.1120.2040.3600.5240.6720.8181.000

1000%U0.0680.1000.1400.2660.4260.5620.6800.8500.928
W0.0480.1100.1920.3680.4680.7480.8320.9540.988

40%U0.0520.0560.0740.0920.1480.2500.2950.3780.478
W0.0480.0880.2080.3680.6400.8180.9220.9740.998

2000%U0.0500.1400.2730.4260.7000.8470.9430.9971.000
W0.0700.0900.3500.6260.8560.9541.0001.0001.000

40%U0.0500.0600.0650.1900.2270.3900.4370.5900.750
W0.0660.1380.3820.6760.9180.9820.9981.0001.000

### Table 3

Independence test of infection times between udder quarters

Ssep-valueS w1sew1p-valueS w2sew2p-value
(T1, T2)−7.1984.5650.115−5.0731.260<0.001−3.3741.9460.083
(T1, T3)−2.6954.1100.512−3.4941.071<0.001−2.5621.9550.098
(T1, T4)−7.5104.9790.131−5.5911.000<0.001−2.8671.7090.935
(T2, T3)−1.9624.7760.681−4.2901.011<0.001−3.1592.0610.126
(T2, T4)−3.1954.5640.484−6.0381.479<0.001−2.1871.5840.167
(T3, T4)1.1644.1350.778−4.8131.059<0.001−1.6181.3580.233

References
1. Betensky R and Finkelstein DF (1999). An extension of Kendall’s coefficient of concordance to bivariate interval censored data. Statistics in Medicine, 18, 3101-3109.
2. Bogaerts K, Komárek A, and Lesaffre E (2018). Survival Analysis with Interval-Censored Data: A Practical Approach with Examples in R, SAS, and BUGS, New York, CRC-Press.
3. Bogaerts K and Lesaffre E (2008). Estimating local and global measures of association for bivariate interval censored data with a smooth estimate of density. Statistics in Medicine, 27, 5941-5955.
4. Ding AA and Wang W (2004). Testing independence for bivariate current status data. Journal of the American Statistical Association, 99, 145-155.
5. Harrington DP and Fleming TR (1982). A class of rank test procedures for censored survival data. Biometrika, 69, 553-566.
6. Hsu L and Prentice RL (1996). A generalisation of the Mantel-Haenszel tests to bivariate failure time data. Biometrika, 83, 905-911.
7. Marloes Maathuis (2013). MLEcens, R package.
8. Oakes D (1982). A concordance test for independence in the presence of censoring. Biometrics, 38, 451-455.
9. Shih JH and Louis TA (1996). Tests of independence for bivariate survival data. Biometrics, 52, 1440-1449.
10. Sun L, Wang L, and Sun J (2006). Estimation of the association for bivariate interval censored failure time data. Scandinavian Journal of Statistics, 33, 637-649.
11. Sun J (2006). The Statistical Analysis of Interval-censored Failure Time Data, New-York, Springer.
12. Sun L, Wang L, and Sun J (2006). Estimation of the association for bivariate interval censored failure time data. Scandinavian Journal of Statistics, 33, 637-649.
13. Sun T and Ding Y (2019). Array.
14. Sun T and Ding Y (2021). Copula-based semiparametric regression method for bivariate data under general interval censoring. Biostatistics, 22, 315-330.
15. Sun T, Liu Y, Cook RJ, Chen W, and Ding Y (2019). Copula-based score test for bivariate time-to-event data, with application to a genetic study of AMD progression. Life Time Data Analysis, 25, 546-568.
16. Wang Wand Ding AA (2000). On assessing the association for bivariate current status data. Biometrika, 87, 879-893.