Longitudinal data are collected over time from the same subjects. Therefore, the outcomes from the same subjects are correlated. Many models have been proposed to analyze data such as linear mixed models and generalized linear mixed models (GLMMs). Especially, the GLMMs are commonly used to analyze longitudinal categorical data (Breslow and Clayton, 1993), and the GLMMs specify the effects of covariates on response conditional for random effects.
Baseline-category logit random effects models are typically used to analyze longitudinal nominal data (Theil, 1969, 1970), and the models account for subject-specific variations using the random effects covariance matrix. However, the random effects covariance matrix in the models cannot explain the serial correlations of nominal outcomes. The random effects covariance matrix must be heterogeneous and high-dimensional to account for both the correlations and subject-specific variations. However, it is difficult to estimate the random effects covariance matrix due to high dimensionality and positive-definiteness (Lee
The MCD approach uses the new unconstrained parametrization of an inverse of a covariance matrix. As a result, the parameters of the MCD are generalized autoregressive parameters (GARPs) and innovation variances (IVs). The GARPs are dependence parameters describing the serially dependence of the previous outcomes, and the IVs are prediction variances. The positive-definiteness restriction of the covariance matrix is that the IVs need to be positive (Pourahmadi, 1999, 2000). Pan and Mackenzie (2006) used the MCD to address joint mean-covariance estimation for linear models. Lee
There is much literature dealing with models for longitudinal nominal data. Multinomial logit models were developed by Theil (1969, 1970). Daniels and Gatsonis (1997) proposed a Bayesian two-level generalized logit model to accommodate clustered nominal data. Revelt and Train (1998) proposed discrete choice models with random coefficients that do not have the restrictive ‘independence from irrelevant alternatives’ property. Hartzel
This paper is organized as follows. In Section 2, we propose baseline-category logit random effects models for longitudinal nominal data using the MCD approach. In Section 3, we present Bayesian methodology for estimation of parameters. In Section 4, we illustrate real data and apply our proposed models to them. Finally, we summarize this paper in Section 5.
In this section, we propose baseline-category logit random effects models with autoregressive random effects covariance matrix to analyze longitudinal nominal data.
Let
Then conditional probabilities for
The random effects covariance matrix ∑
In this section, we describe the random effects covariance matrix ∑
Note that the elements of the matrix Ψ
We also assume that
Then we reexpress equations (2.3) and (2.4) in matrix form as
From equation (2.5), we have
We note that the random effects covariance matrix is directly decomposed into the GARMs and IVMs. They can be modeled using time and/or subject-specific covariate vectors
Note that
We now describe Bayesian approaches to estimate parameters in our proposed models. We derive the likelihood function for the model specified in Subsection 2.1. The parameters in model (2.7) are the regression coefficients which ranges on (−∞,∞). In this case, normal priors are commonly used for parameters and guarantee the propriety of posterior distributions. The normal priors with large prior variances remains relatively objective (Daniels and Zhao, 2003). The priors distributions for the parameters in the model with the AR structure of random effects covariance matrix are given by
In general,
To generate parameters from the posterior distribution, MCMC methods are adapted to generate posterior samples for model estimation. The full condition posterior distribution are given below:
For
For
For
For
All full conditionals are not closed forms; therefore, we construct suitable proposals for a Metropolis-Hastings step. In practice, MCMC is implemented using JAGS (http://mcmc-jags.sourceforge.net/).
The McKinney homeless research project (MHRP), first described by Hurlburt
Figure 1 presents marginal proportions of responses for MHRP data. The marginal proportion plots indicate the marginal probability for Section 8 certificate at each time, indicating the difference between the groups.
For more detailed explanations on categorical outcomes, housing categories by living arrangements are summarized in Table 1 according to the study by Hurlburt
We fit five models which are a typical GLMM and four baseline-category logit random effects models with several structure of ∑
Table 2 presents the specification of the five models for ∑
For the estimation of all parameters in the models, Gibbs sampler is implemented using JAGS in R. Posterior means were calculated with a sample size of 250,000, thin of 5 and burn-in period of 100,000. To use the Gelman and Rubin approach, we used multiple chains (chain of 2). We also checked the convergence of all parameters in the models using the trace plots of random numbers for the parameters. Using the plots, we observed that the lines of different chains were mixed and crossed; convergence was then satisfied.
Table 3 shows comparison of the four models using deviance information criterion (DIC) (Spiegelhalter
Table 4 is organized into three parts according to the nominal response categories to be compared (either community vs street/shelter or independent vs street/shelter) and GARPs. The top part of Table 4 presents the estimates of coefficients and associated 95% credible interval to compare the two nominal response categories of community housing and street/shelter housing. For these two categories, the estimated baseline-category logit is given by
Second, to compare independent housing and street/shelter housing we consider the lower part of Table 4. The baseline-category logit is
The posterior means of regression coefficients for 6 Month versus Baseline, 12 Month versus Baseline and 24 Month versus Baseline were not in credible intervals. The odds ratios of the control group were
Some may be interested in comparing community housing and independent housing. The baseline-category parameters of the
Thus,
The posterior means for diagonal element matrices of
In the GARMs, the posterior mean for Ψ
Figure 2 compares fitted marginal probabilities for Section 8 group versus control group. In the street/shelter housing, two estimated marginal probabilities decreased as the month increased. In the community housing and independent housing, there were many difference between Section 8 and control groups.
We proposed Bayesian baseline-category logit random effects models for longitudinal nominal data. In the models, the modified Cholesky decomposition (MCD) was used to decompose the random effects covariance matrix to the generalized autoregressive matrices (GARMs) and innovation variance matrices (IVMs). The GARMs account for serial correlations of nominal outcomes, and the IVMs explain prediction error variances. The MCD represents a computationally attractive approach and provides a better fit than the competing random intercept model with a homogeneous covariance. The proposed models also were fitted using a Bayesian approach. McKinney homeless research project (MHRP) data were analyzed using our proposed models. We fitted five baseline-category logit models to compare. Among the models, the model with a heteroscedastic AR(1) random effects covariance matrix was the best fit to our data. The estimated conditional probabilities for three groups were different trends as months increased.
Response of marginal proportions under two group (Section 8 and control) for response 1 (street/shelter), 2 (community) and 3 (independent housings), respectively.
Model fit of marginal proportions under two group (Section 8 and control) for response 1 (street/shelter), 2 (community) and 3 (independent housings), respectively.
Nominal outcomes by living arrangements
Outcomes | Living arrangement |
---|---|
Street/shelter housing | Public or private shelter |
Church/chapel | |
Indoor public place (bus station/theater) | |
Abandoned building | |
Car or other vehicle | |
Outside without shelter | |
Community housing | Hotel |
Family member’s home or room | |
Friend’s or acquaintance’s home or room | |
Boarding house/halfway house | |
Independent housing | Private house or own apartment |
Models fit with
Model | GARMs | log(ICMs) |
---|---|---|
GLMM | NA | |
AR(1)-CC: | ||
AR(1)-CA: | ||
AR(1)-AC: | ||
AR(1)-AA: |
DIC of Bayesian baseline-category logit random effects model for MHRP data
Model | DIC | |||
---|---|---|---|---|
GLMM | 2069.00 | 1666.34 | 402.66 | 2471.66 |
AR(1)-CC | 1612.81 | 1227.92 | 384.89 | 1997.70 |
AR(1)-CA | 1629.63 | 1261.44 | 368.19 | 1997.82 |
AR(1)-AC | 1590.25 | 1208.74 | 381.51 | 1971.76 |
AR(1)-AA | 1526.97 | 1090.70 | 436.27 | 1963.24 |
Posterior means of Bayesian baseline-category logit random effects model using MCD for MHRP data (95% Bayesian confidence interval)
GLMM | AR(1)-CC | AR(1)-CA | AR(1)-AC | AR(1)-AA | |
---|---|---|---|---|---|
Intercept | −0.09 (−0.50, 0.33) | −0.24 (−0.58, 0.09) | −0.24 (−0.60, 0.12) | −0.21 (−0.55, 0.12) | −0.27 (−1.34, 0.90) |
6 Month vs Baseline | 1.71^{*} (0.74, 3.11) | 1.62^{*} (1.02, 2.30) | 1.64^{*} (1.04, 2.34) | 1.61^{*} (1.03, 2.26) | 3.01^{*} (1.40, 5.65) |
12 Month vs Baseline | 1.99^{*} (0.68, 3.22) | 2.60^{*} (1.71, 3.62) | 2.62^{*} (1.71, 3.63) | 2.73^{*} (1.81, 3.79) | 3.48^{*} (1.37, 5.67) |
24 Month vs Baseline | 0.79 (−1.14, 1.96) | 2.22^{*} (0.80, 3.68) | 2.26^{*} (0.89, 3.68) | 2.53^{*} (1.02, 4.19) | 2.48 (−0.30, 4.91) |
Section 8 (YES, NO) | −0.88 (−3.33, 0.29) | 0.07 (−0.47, 0.55) | 0.05 (−0.57, 0.57) | 0.00 (−0.51, 0.48) | −0.10 (−1.33, 0.88) |
Section 8 by 6 Month | −1.34 (−4.23, 0.49) | −0.38 (−1.39, 0.59) | −0.37 (−1.43, 0.64) | −0.19 (−1.31, 0.93) | −1.34 (−4.09, 0.61) |
Section 8 by 12 Month | −3.66 (−7.42, −1.20) | −2.42^{*} (−3.71, −1.21) | −2.39^{*} (−3.80, −1.09) | −2.19^{*} (−3.88, −0.62) | −2.77^{*} (−5.58, −0.31) |
Section 8 by 24 Month | −1.45 (−4.20, 0.36) | −0.90 (−2.91, 0.81) | −0.72 (−2.86, 1.19) | −0.95 (−3.57, 1.44) | −0.54 (−3.66, 2.63) |
2.05 (−3.25, 4.62) | −1.64 (−5.69, 1.11) | −1.22 (−6.10, 1.46) | −1.46 (−5.54, 0.44) | 3.40^{*} (0.61, 6.14) | |
−0.07 (−1.81, 1.40) | −3.06^{*} (−6.27, −0.27) | ||||
Intercept | −0.52^{*} (−0.83,−0.21) | −1.47^{*} (−2.02,−0.98) | −1.44^{*} (−1.99, −0.96) | −1.62^{*} (−2.35, −1.06) | −1.67^{*} (−2.46, −1.08) |
6 Month vs Baseline | 0.98^{*} (0.41,1.57) | 1.52^{*} (0.74,2.30) | 1.58^{*} (0.80, 2.34) | 1.56^{*} (0.74, 2.38) | 1.94^{*} (1.09, 2.89) |
12 Month vs Baseline | 1.97^{*} (1.30,2.69) | 2.52^{*} (1.23,3.80) | 2.64^{*} (1.33, 3.87) | 2.76^{*} (1.38, 4.11) | 3.36^{*} (1.95, 5.02) |
24 Month vs Baseline | 1.66^{*} (1.02,2.32) | 2.07^{*} (0.23,3.84) | 2.24^{*} (0.39, 3.95) | 2.63^{*} (0.61, 4.60) | 2.83^{*} (0.72, 5.15) |
Section 8 (YES, NO) | −0.70^{*} (−1.27,−0.16) | −0.02 (−0.70,0.64) | −0.13 (−0.92, 0.60) | 0.08 (−0.63, 0.79) | −0.75 (−2.58, 0.62) |
Section 8 by 6 Month | 2.23^{*} (1.29,3.23) | 2.03^{*} (0.89,3.26) | 2.09^{*} (0.88, 3.51) | 2.25^{*} (0.94, 3.81) | 3.25^{*} (1.15, 5.89) |
Section 8 by 12 Month | 0.86 (−0.11,1.84) | 1.35 (−0.14,2.93) | 1.39 (−0.18, 3.09) | 1.73 (−0.05, 3.84) | 2.70^{*} (0.10, 5.70) |
Section 8 by 24 Month | 1.50^{*} (0.52,2.52) | 3.082^{*} (0.91,5.41) | 3.11^{*} (0.77, 5.62) | 2.95^{*} (0.25, 5.82) | 4.08^{*} (0.88, 7.47) |
−2.92^{*} (−6.67,−0.21) | −0.50 (−1.80,0.69) | −0.74 (−2.40, 0.65) | −0.26 (−1.91, 1.19) | −0.16 (−2.11, 1.45) | |
0.44 (−0.76, 1.46) | 1.33 (−2.96, 3.73) | ||||
1.91^{*} (0.70,4.40) | 1.72^{*} (0.61, 4.91) | 1.78^{*} (0.80, 4.62) | 0.55^{*} (0.32, 0.97) | ||
0.05 (−0.18,0.25) | 0.05 (−0.20, 0.26) | −0.52 (−2.36, 0.73) | 0.02 (−1.28, 1.22) | ||
−0.69 (−5.27,0.90) | −0.34 (−5.40, 0.98) | 0.19 (−0.16, 0.52) | −0.15 (−1.32, 0.57) | ||
2.11^{*} (1.48,2.94) | 2.11^{*} (1.38, 3.09) | −0.18 (−0.63, 0.23) | 0.17 (−0.62, 1.35) | ||
−0.655 (−5.35, 0.88) | 0.01 (−0.09, 0.21) | ||||
3.26 (−2.04, 8.48) | 3.14^{*} (0.18, 8.07) | ||||
1.94^{*} (1.27, 2.93) | 2.15^{*} (1.33, 3.33) | ||||
−0.266 (−1.97, 0.89) | −0.95^{*} (−2.34, 0.37) |
^{*}indicates the 95% credible interval does not include zero.
C vs S is Community vs Street/Shelter, I vs S is Independent vs Street/Shelter