Modeling of the random effects covariance matrix in generalized linear mixed models (GLMMs) is an issue in analysis of longitudinal categorical data because the covariance matrix can be high-dimensional and its estimate must satisfy positive-definiteness. To satisfy these constraints, we consider the autoregressive and moving average Cholesky decomposition (ARMACD) to model the covariance matrix. The ARMACD creates a more flexible decomposition of the covariance matrix that provides generalized autoregressive parameters, generalized moving average parameters, and innovation variances. In this paper, we analyze longitudinal count data with overdispersion using GLMMs. We propose negative binomial loglinear mixed models to analyze longitudinal count data and we also present modeling of the random effects covariance matrix using the ARMACD. Epilepsy data are analyzed using our proposed model.
Subjects are repeatedly observed over time in longitudinal studies; therefore, longitudinal data have between-subject variation as well as serial correlation of repeated outcomes (within-subject variation) (Diggle
Modeling of the random effects covariance matrix in generalized linear mixed models (GLMMs) is a big issue in analysis of longitudinal categorical data (Breslow and Clayton, 1993) because the covariance matrix can be high-dimensional and its estimate must satisfy positive-definiteness (Lee
In the MCD, precision matrix is decomposed to generalized autoregressive parameters (GARPs) and innovation variances (IVs) which are modeled using linear and loglinear models, respectively (Pourahmadi, 1999, 2000). Therefore, the number of unknown elements in the precision matrix is reduced and the IVs are positive. In the MACD, covariance matrix is decomposed into generalized moving average parameters (GMAPs) and IVs. Similar to the MCD, the unknown elements of the covariance matrix is reduced, and the GMAPs and IVs are modeled using linear and loglinear models, respectively. The ARMACD combines the two Cholesky decompositions to model the covariance matrix (Lee
In this paper, we analyze longitudinal count data with overdispersion using GLMMs. To analyze longitudinal count data, Poisson loglinear mixed models (PLMMs) are typically used (Choi and Lee, 2017). In the models, identical mean and variance are assumed (Agresti, 2002). However, the variance is significantly larger than the mean in many cases. The greater variability than predicted by the Poisson models reflects overdispersion (Molenberghs
The outline of the paper is as follows. In Section 2, we propose modelling of the random effects covariance matrix using ARMACD in negative binomial loglinear mixed model and calculate maximum likelihood estimation. In Section 3, we apply our proposed models in epilepsy data and compare Poisson case. Finally, we summarize and proposed future work in Section 4.
We now propose negative binomial loglinear mixed models for longitudinal overdispersed count data using the ARMACD.
Let
where
Using the ARMACD, we assume the ARMA structure of the random effects covariance matrix which was used in analysis of longitudinal binary data (Lee
where
Using matrix form, (
where
Note that ∑
where
Note that IVs are always positive using the loglinear model in (
From result in Pourahmadi (2011), matrix
where
where
When the series is partly autoregressive and partly moving average, the ARMA models are parsimonious compared to the AR or the MA models (Judge
In this section, we derive maximum likelihood estimate of parameters for the proposed model using the likelihood function. Let
where
Log-likelihood function is calculated as
We factor ∑
Maximum Likelihood estimator is derived by solving the following equations:
where
We can solve numerically the estimating
where
When this algorithm converged, the inverse of
Note that the solution of the
where the set (
We next analyzed a data set from epilepsy study using our proposed model. The data set was first reported in a paper by Faught
Figure 1 shows average seizures for the two groups over week. There were no differences of average seizures between two groups until eighteen weeks. In week 19, there was extreme value in the placebo group, which is because very few observations were available.
We let response variable
We analyzed epilepsy study data using 4 Poisson linear mixed models (PLMMs) (Choi and Lee, 2017) and 6 negative binomial loglinear mixed models (NBLMMs) specified in Table 1.
We use the following notation: P-ARMA(?, ?) and NB-ARMA(?, ?) where P and NB respectively indicate Poisson linear mixed and negative binomial loglinear mixed models, and the ’?’ correspond to the polynomial in the GARP and GMAP respectively or the AR and MA order, respectively. So an NB-ARMA(1, 1) would correspond to a heteroscedastic ARMA(1, 1) model with IV depending on type of treatment and NB-ARMA(L, L) would correspond to a heteroscedastic ARMA model with a linear in time difference for the GARP and GMAP. To find MLEs of each parameters, we used a QMC method that is explained in Subsection 2.2. The quasi-Newton algorithm was iterated until sum of absolute differences is less than 10^{−4}.
We compared the ten models using maximized loglikelihoods, Akaike information criteria (AIC) and Bayesian information criteria (BIC), presented in Table 2, respectively. Overall, negative binomial loglinear mixed models had much lower AICs and BICs than Poisson models. Therefore, we focused on the negative binomial loglinear mixed models.
We compared nested models (NB-ARMA(1, 0) vs NB-ARMA(2, 0), NB-ARMA(1, 0) vs NB-ARMA(1, 1), NB-ARMA(1, 1) vs NB-ARMA(2, 1), NB-ARMA(2, 0) vs NB-ARMA(2, 1), and NB-ARMA(2, 1) vs NB-ARMA(2, 2)) using likelihood ratio test (Table 3). It indicates that NB-ARMA(2, 1) was the best model among above models. We also compared the above models and NB-ARMA(L, L) using AIC and BIC which are presented in Table 2. It indicates NB-ARMA(2, 1), NB-ARMA(2, 2), and NB-ARMA(L, L) were competitive. Therefore, we focused on the three models.
Table 4 provides maximum likelihood estimates of the parameters for NB-ARMA(2, 1), NB-ARMA(2, 2), and NB-ARMA(L, L). The parameters of coefficients of covariates in the three models were not significant. Thus, treatment did not appear to differentially impact the mean between placebo and treatment groups. In NB-ARMA(2, 1), overdispersion parameter was significant. It indicates that data were overdispersed compared to mean (
We have proposed negative binomial loglinear mixed models to analyze longitudinal count data. The models account for the within-subject variation and between-subject variance using an ARMA random effects covariance matrix. The ARMA Cholesky decomposition (ARMACD) factors the random effects covariance matrix to GARPs, GMAPs and IVs. The GARPs, GMAPs, and IVs are modeled using linear and loglinear models which solve the positive definiteness constraint of the covariance matrix. In addition, the ARMACD creates a more flexible decomposition of the covariance matrix than existing decompositions. The estimation of parameters in the proposed models were calculated using a quasi-Newton algorithm. Calculation of the integrations in likelihood and derivatives was conducted by quasi Monte Carlo integration.
Epileptic seizure data were analyzed by Poisson/negative binomial loglinear mixed models with several structures of the random effects covariance matrix. The random effects covariance matrices with an ARMA(2, 1), an ARMA(2, 2), and an ARMA linear in time difference were competitive. The result shows that the parameters of coefficients of covariates in the models were not significant.
This project was supported by Basic Science Research Program through the National Research Foundation of Korea (KRF) funded by the Ministry of Education, Science and Technology (NRF-2016R1D1 A1B03930343).
Models for
Model | GARP | GMAP | IV | |
---|---|---|---|---|
Poisson model | P-ARMA(1, 0) | |||
P-ARMA(2, 0) | ||||
P-ARMA(1, 1) | ||||
P-ARMA(2, 1) | ||||
Negative binomial model | NB-ARMA(1, 0) | |||
NB-ARMA(2, 0) | ||||
NB-ARMA(1, 1) | ||||
NB-ARMA(2, 1) | ||||
NB-ARMA(2, 2) | ||||
NB-ARMA(L, L) |
PLMMs = Poisson loglinear mixed models; NBLMMs = negative binomial loglinear mixed models; GARPs = generalized autoregressive parameter; GMAP = generalized moving average parameter; IV = innovation variance; ARMA = autoregressive moving average.
Log likelihood, AIC and BIC for the models
P-ARMA(1, 0) | P-ARMA(2, 0) | P-ARMA(1, 1) | P-ARMA(2, 1) | ||
---|---|---|---|---|---|
Poisson | Max. loglik. | −2945.865 | −2940.188 | −2911.220 | −2892.202 |
AIC | 5905.730 | 5896.376 | 5838.440 | 5802.404 | |
BIC | 5923.150 | 5916.285 | 5858.349 | 5824.802 | |
NB-ARMA(1, 0) | NB-ARMA(2, 0) | NB-ARMA(1, 1) | NB-ARMA(2, 1) | ||
Negative binomial | Max. loglik. | −2768.272 | −2765.395 | −2763.373 | −2759.630 |
AIC | 5552.544 | 5548.790 | 5544.746 | 5539.260 | |
BIC | 5572.453 | 5571.188 | 5567.144 | 5564.146 | |
NB-ARMA(2, 2) | NB-ARMA(L, L) | ||||
Negative binomial | Max. loglik. | −2758.204 | −2758.197 | ||
AIC | 5538.408 | 5538.394 | |||
BIC | 5565.783 | 5565.769 |
AIC = Akaike information criteria; BIC = Bayesian information criteria; ARMA = autoregressive moving average.
Likelihood ratio tests for negative binomial models
Models | df | ||
---|---|---|---|
NB-ARMA(1, 0) vs NB-ARMA(2, 0) | 5.754 | 1 | 0.017 |
NB-ARMA(1, 0) vs NB-ARMA(1, 1) | 9.798 | 1 | 0.002 |
NB-ARMA(1, 1) vs NB-ARMA(2, 1) | 7.486 | 1 | 0.006 |
NB-ARMA(2, 0) vs NB-ARMA(2, 1) | 11.530 | 1 | 0.001 |
NB-ARMA(2, 1) vs NB-ARMA(2, 2) | 2.852 | 1 | 0.091 |
df = degrees of freedom; ARMA = autoregressive moving average.
Maximum likelihood estimates for NB-ARMA(2, 1), NB-ARMA(2, 2), and NB-ARMA(L, L) with standard errors in the parentheses
NB-ARMA(2, 1) | NB-ARMA(2, 2) | NB-ARMA(L, L) | |
---|---|---|---|
Mean parameter | |||
Intercept ( | 0.222 (0.326) | 0.197 (0.421) | 0.196 (0.357) |
Treatment ( | 0.368 (0.351) | 0.379 (0.376) | 0.379 (0.401) |
Week ( | −1.582 (0.839) | −1.656 (1.148) | −1.657 (0.938) |
Week×Treatment ( | 1.166 (0.854) | 1.221 (0.962) | 1.222 (1.017) |
0.278* (0.025) | 0.275* (0.034) | 0.275* (0.024) | |
GARP/GMAP | |||
1.512* (0.413) | 1.172* (0.434) | −0.839* (0.425) | |
−0.506 (0.410) | −0.163 (0.428) | 0.255 (0.538) | |
−0.731 (0.292) | −0.250 (0.535) | 1.270* (0.210) | |
−0.269 (0.235) | 13.012 (9.686) | ||
IV | |||
−1.597* (0.365) | −1.669* (0.553) | −1.669* (0.401) | |
0.499 (0.267) | 0.510 (0.367) | 0.510 (0.727) |
^{*}indicates significance with 95% confidence level.
ARMA = autoregressive moving average; GARPs = generalized autoregressive parameter; GMAP = generalized moving average parameter; IV = innovation variance.