TEXT SIZE

search for



CrossRef (0)
Latent class analysis with multiple latent group variables
Communications for Statistical Applications and Methods 2017;24:173-191
Published online March 31, 2017
© 2017 Korean Statistical Society.

Jung Wun Leea, and Hwan Chung1,a

aDepartment of Statistics, Korea University, Korea
Correspondence to: 1Department of Statistics, Korea University, 145, Anam-ro, Seongbuk-gu, Seoul 02841, Korea. E-mail: hwanch@korea.ac.kr
Received February 10, 2017; Revised March 8, 2017; Accepted March 9, 2017.
 Abstract

This study develops a new type of latent class analysis (LCA) in order to explain the associations between one latent variable and several other categorical latent variables. Our model postulates that the prevalence of the latent variable of interest is affected by another latent variable composed of other several latent variables. For the parameter estimation, we propose deterministic annealing EM (DAEM) to deal with local maxima problem in the proposed model. We perform simulation study to demonstrate how DAEM can find the set of parameter estimates at the global maximum of the likelihood over the repeated samples. We apply the proposed LCA model in an investigation of the effect of and joint patterns for drug-using behavior to violent behavior among US high school male students using data from the Youth Risk Behavior Surveillance System 2015. Considering the age of male adolescents as a covariate influencing violent behavior, we identified three classes of violent behavior and three classes of drug-using behavior. We also discovered that the prevalence of violent behavior is affected by the type of drug used for drug-using behavior.

Keywords : deterministic annealing EM, drug-using behavior, joint latent class analysis, multiple modalities, violent behavior
INTRODUCTION

Violent and drug-using behavior are a major social issue among US adolescents that contribute to premature death, disability, and other social problems (Van Horn et al., 2014). Miller et al. (2007) revealed that young injection drug users may be exposed to the risk of premature mortality due to health risks. These delinquent behavior were studied through previous research that showed strong associations between drug-using and violent behavior. White et al. (1999) showed that there are concurrent associations between frequency of drug use and violence among adolescence teenagers. Lundholm (2013) revealed that acute substance intake and abuse increase the risks of both interpersonal and self-directed violence. Some studies have also found that students who drank alcohol on school property were more likely to carry weapons at school than those who did not drink alcohol (Lowry et al., 1999) and that the severity of adolescent violent behavior significantly correlated to the frequency of cigarette smoking, alcohol use, and multiple substance use (DuRant et al., 2000). Dawkins (1997) found that the use of alcohol and marijuana is more important in terms of their effects on some violent offenses than other drug use. Many of these types of studies used several questionnaires to measure drug-using behavior and violent behavior. Standard statistical methods were used to find associations among variables of interests that transformed the original manifest items to scale or categorical variables; calculated standard statistics such as chi-square, Spearman ranked correlation, Pearson’s correlation, Kruskal-Wallis, and odds ratio; and conducted multiple linear regression or logistic regression to verify the effect of drug-using behavior on some violent offenses. However, variables like drug-using behavior or violent behavior cannot be measured directly with a single item and should be considered as latent variables. The latent variable may be identified through multiple manifest items that measure some hypothetical concepts. The latent class analysis (LCA) can be very helpful in this line of research. LCA identifies a small number of underlying subgroups of the population, using several manifest items. These subgroups are called latent classes. Manifest items measuring a latent variable are strongly correlated and LCA can identify individuals whose response patterns for the manifest items are similar and classify them into the same latent class.

In this paper, a new type of LCA with multiple latent groups (LCA-MLG) has been proposed so that we can investigate the effect of drug-using behavior on violent behavior. In our model, the subgroups of drug-using behavior are identified by the joint latent class analysis (JLCA) using the framework of LCA with multiple groups. We adopted deterministic annealing EM (DAEM) as a parameter estimation strategy to overcome the local maxima problem. We then use the proposed model to the data from the Youth Risk Behavior Surveillance System 2015 (YRBSS 2015) in an investigation of the effect of the joint patterns of drug-using behavior to violent behavior among the US high school male students (Centers for Disease Control and Prevention, 2015).

The remainder of this paper presents the description of the proposed model LCA-MLG and estimation methods for the model parameters in Sections 2 and 3, respectively. In Section 4, we evaluate the performance of DAEM over repeated sampling. We then apply the proposed model to the real dataset from the YRBSS 2015 in Section 6.

2. Model

2.1. Latent class analysis

The LCA is a finite mixture model for dividing population into several subgroups based on individuals’ responses to the manifest items. LCA assumes that the population is composed of several unobservable subgroups (i.e., latent classes) which can be measured by multiple manifest items indirectly, implying that associations among the manifest items are totally explained by latent class variable. Suppose there are P categorical manifest items Z1, . . . , ZP. The responses of each manifest item for the ith individual are obtained as a P-dimensional vector zi = [zi1, . . . , ziP]T, where zip can take any value from 1, . . . , rp for p = 1, . . . , P. Let the latent class variable W has D categories, then the observed-data likelihood of LCA can be specified as

P(Z=zi)=w=1DP(W=w,Z=zi)=w=1DP(W=w)P(Z=ziW=w)=w=1DP(W=w)p=1PP(Zp=zipW=w)=w=1Dδwp=1Ph=1rpφphwI(zip=h),

where I(zip = h) is the indicator function which is 1 when zip = h and 0 otherwise. The likelihood of LCA given in (2.1) is constructed under the local independence assumptions, implying that the manifest items are conditionally independent given a latent class membership. Here, φph|w = P(Zp = h | W = w), referred as the primary measurement parameter, explains the relationship between the latent class and the pth manifest item, and δw = P(W = w) represents the prevalence of latent class membership w; therefore, parameters in (2.1) are conditional probabilities, the sum-to-one and non-negative constraints are explicit (i.e., w=1Dδw=1 and h=1rpφphw=1 for p = 1, . . . , P, w = 1, . . . , D).

2.2. Joint latent class analysis (JLCA)

The JLCA is an extended version of the LCA model to deal with multiple latent class variables. Suppose there are J latent class variables C = [C1, . . . ,CJ ]T, which can be identified by the J sets of manifest items Y1, . . . , YJ, respectively. Let the jth latent class variable Cj have Kj nominal categories for j = 1, . . . , J. Here, Yj = [Y1j, . . . , YMjj]T is the vector of the jth set of manifest items measuring the latent class Cj for j = 1, . . . , J. Further, let yi j = [yi1j, . . . , yiMjj]T be the ith individual’s responses to a set of Mj manifest items measuring the jth latent class Cj, where each item response yimjj can take any value from 1 to rmj for mj = 1, . . . , Mj and j = 1, . . . , J. In JLCA, the association among J latent class variables C = [C1, . . . ,CJ]T can be explained by other latent subgroups (i.e., joint latent classes). The population can be divided into several subgroups that have similar joint patterns of the latent class variables C = [C1, . . . ,CJ]T .

Let the joint latent class variable U have S nominal categories describing the most common joint patterns of the latent classes. Then, the ith individual’s contribution to the likelihood function for JLCA can be specified as

P(Y=yi)=u=1Sc1=1K1cJ=1KJP(U=u,C=c,Y=yi)=u=1Sc1=1K1cJ=1KJP(U=u)P(C=cU=u)P(Y=yiC=c)=u=1Sc1=1K1cJ=1KJP(U=u)j=1J{P(Cj=cjU=u)mj=1MjP(Ymjj=yimjjCj=cj)}=u=1Sc1=1K1cJ=1KJγuj=1J{ηcju(j)mj=1Mjk=1rmjρmjkjcjI(yimj=k)},

where I(yimjj = k) is the indicator function which is 1 when yimjj = k and 0 otherwise. The primary measurement parameter, ρmjkj|cj = P(Ymjj = k | Cj = cj), is the probability of responding k for the mjth item measuring the jth latent class variable, given that the jth latent class membership is cj. The secondary measurement parameter ηcju(j)=P(Cj=cjU=u) represents the probability of the jth latent class variable having membership cj given that the joint latent class membership is u.

The prevalence parameter, γu = P(U = u), represents the probability of having uth joint latent class membership. As LCA, all parameters in (2.2) are conditional probabilities, the sum-to-one and non-negative constraints are explicit (i.e., k=1rmjρmjkjcj=1 for mj = 1, . . . , Mj, cj = 1, . . . , Kj, and j = 1, . . . J; cj=1Kjηcju(j)=1 for j = 1, . . . J and u = 1, . . . , S ; and u=1Sγu=1). The likelihood function of JLCA given in (2.2) is based on the following three assumptions: (a) The joint class is related to manifest items only through the latent class variables; (b) the manifest items are conditionally independent given the latent class membership; and (c) the latent class membership is unrelated each other given joint latent class u.

2.3. Latent class analysis with multiple latent group variables (LCA-MLG)

The LCA-MLG postulates that latent class variable may be affected by joint latent class variable which can be identified by the JLCA model. We therefore consider joint latent class as a latent group variable in the traditional LCA. We propose the LCA-MLG and illustrate the model in Figure 1. The right side of Figure 1 is a JLCA with joint latent class variable U uncovered by the vector of latent class variable C = [C1, . . . ,CJ]T, and each latent variable Cj is identified by the jth set of manifest items Yj = [Y1j, . . . , YMjj]T . The left side of Figure 1 is an ordinary LCA with latent variable W which can be identified through the manifest items Z = [Z1, . . . , ZP]T . Consequently, the outcome latent class variable W is affected by the joint latent class variable U, and the distribution of the outcome latent class variable W is varied as the value of latent group variable U changes. In this manner, the association between latent group variable and the outcome latent class variable can be investigated. Using the notation given in (2.1) and (2.2), the complete-data likelihood of the LCA-MLG for the ith observation is obtained by

Li*=P(W=w,Z=zi,U=u,C=c,Y=yi)=P(W=w,Z=ziU=u)P(U=u,C=c,Y=yi)=P(W=wU=u)P(Z=ziW=w)×P(U=u)P(C=cU=u)P(Y=yiC=c)=δwup=1Ph=1rpφphdI(zip=h)γuj=1J{ηcju(j)mj=1Mjk=1rmjρmjkjcjI(yimjj=k)}.

The likelihood of the manifest items (i.e., the observed-data likelihood) can be derived by the marginal summation of (2.3) with respect to all considered latent variables:

Li=P(Zi=zi,Yi=yi)=u=1Sc1=1K1cJ=1KJw=1DLi*.

The prevalence of the outcome latent class variable may be affected by the demographic characteristics or other individual information, and these characteristics can be considered as covariates in the proposed model (Figure 1). Suppose we have a vector of covariates xi = [xi1, . . . , xip]T for the ith observation which may influence the prevalence of the outcome latent class P(W = W | U = u). Then the prevalence of latent class can be modeled with the multinomial logistic regression by substituting δw|u given in (2.3) into δwu(xi)=exp(xiTβwu)/l=1Dexp(xiTβlu), where the vector of logistic regression coefficients βw|u = [β1w|u, . . . , βpw|u]T is interpreted as the log-odds ratio that an individual belongs to a specific latent class w versus to a baseline latent class, given the latent group membership u.

3. Parameter estimation and model diagnosis

3.1. Deterministic annealing EM

The LCA with latent group variable is composed of an unobservable latent structure; therefore, the parameter estimation may be regarded as a missing-data problem. The expectation-maximization (EM) is the standard method to estimate the model parameters for LCA and JLCA. However, the EM algorithm is highly influenced by its initial value. Once the inappropriate initial values are given, the final solution of the EM may be deviated from the global maximum and one of the local maxima will be provided. To overcome this problem, a large number of sets of starting values should be tried for the standard EM algorithm, and we may choose the estimates with the highest log-likelihood. This may help to resolve the local maxima problem; however, the time and computational cost would be very expensive. There is also no guarantee that the result with the highest log-likelihood among the candidates is actually the global maximum. To overcome the difficulty in local maxima for the proposed model, we adopt DAEM method (Ueda and Nakano, 1998).

The DAEM is proposed to overcome the local maxima problem by using the principle of maximum entropy. In the DAEM process, a modified posterior is introduced by additional factor ω and the iteration loop for the ω is located after the E-step and M-step in the conventional EM method. The E-step and M-step are repeated until the estimates converged given ω, then the ω increases to apply another run for the EM algorithm with the updated ω. It is repeated until the ω becomes 1, and we may take the final estimates from the iteration with ω = 1 as the global maximum.

  • E-step: The DAEM maximizes the modified observed-data log-likelihood which can be defined as

    F(ω)=i=1nFi(ω)=i=1n1ωlogu=1Sc1=1K1cj=1KJw=1D(Li*)ω,

    where Li* is the complete-data likelihood derived in (2.3). Note that F(ω) is identical to the observed-data log-likelihood when ω is equal to 1. To maximize (3.1), we adopt a density function q(U = u,C = c,W = w | yi, zi) to express (3.1) as an expectation form. Then by Jensen’s inequality, the lower bound of Fi(ω) is calculated as

    Fi(ω)=1ωlogu=1Sc1=1K1cJ=1KJw=1D(Li*)ωq(U=u,C=c,W=wyi,zi)q(U=u,C=c,W=wyi,zi)1ωu=1Sc1=1K1cJ=1KJw=1Dlog{(Li*)ωq(U=u,C=c,W=wyi,zi)}q(U=u,C=c,W=wyi,zi)=u=1Sc1=1K1cJ=1KJw=1Dq(U=u,C=c,W=wyi,zi)log(Li*)ω-1ωu=1Sc1=1K1cJ=1KJw=1Dq(U=u,C=c,W=wyi,zi)log q(U=u,C=c,W=wyi,zi).

    To determine the optimal choice of q, we take functional derivatives with respect to q and set it zero under the constraint u=1Sc1=1K1cJ=1KJw=1Dq(U=u,C=c,W=wyi,zi)=1 (Chang and Chung, 2013). The optimal choice for q is then:

    q(U=u,C=c,W=wyi,zi)=(Li*)ωs=1Sc1=1K1cJ=1KJd=1D(Li*)ω=θi(u,c1,,cJ,w)(ω).

    The optimal choice q may be considered as modified posterior and we can calculate the expectation of the modified complete-data log-likelihood which can be driven from (3.1) as

    E{i=1n1ωlog(Li*)ω}=i=1nu=1Sw=1Dθi(u,w)(ω)log(δws)+i=1nw=1Dθi(w)(ω)p=1Ph=1rpI(Zip=h)log(φphw)+i=1nu=1Sθi(u)(ω)log(γu)+i=1nj=1Jcj=1Kju=1Sθi(u,cj)(ω)log(ηcju(j))+i=1nj=1J{cj=1Kjθi(cj)(ω)mj=1Mjk=1rmjI(yimj=k)log(ρmkjcj)},

    where the modified posterior probabilities for the specific dimensions are defined as θi(u,w)(ω) = Πjcjθi(u,c1,...,cJ,w)(ω), θi(w)(ω) = ∑uθi(u,w)(ω), θi(u)(ω) = ∑wθi(u,w), θi(u,cj)(ω) = ∑wΠkjckθi(u,c1,...,cJ,w)(ω), and θi(cj)(ω)=u=1Sθi(u,cj)(ω).

  • M-step: We may obtain the estimators that maximize the expectation given in (3.2) by using the Lagrange multiplier. The updated parameter estimates can be calculated as

    γ^u=i=1nθi(u)(ω)n,         δ^wu=i=1nθi(u,w)(ω)i=1nθi(u)(ω),         η^cju(j)=i=1nθi(u,cj)(ω)i=1nθi(u)(ω)ρ^mjkjcj=i=1nθi(cj)(ω)I(yimjj=k)i=1nθi(cj)(ω),         φ^phw=i=1nθi(w)(ω)I(zip=h)i=1nθi(w)(ω).

    Starting with the fixed value of ω as 0, parameter estimates are converged through the DAEM process. Next, we increase ω, and apply the same procedure using the converged parameter estimates at the previous step as the initial value. DAEM iterates this procedure until we have ω = 1. In this manner, ω-loop intervenes the conventional E- and M-step, and we may obtain the parameter estimates at the global maximum without significant consideration for the initial value. If the model contains any covariates, the δ part in the M-step is substituted with the β parameters as a coefficients of multinomial logistic regression. In this case, the standard Newton-Raphson method may be applied at the phase of ω = 1. The details of the Hessian matrix and the score function for β elements are provided in Appendices A and B.

3.2. Model diagnosis and selection

It is important for LCA to assess model fit with a balanced judgement that considers objective measures as well as substantive knowledge in order to understand distinctive features and underlying structure of the data in a simple manner. The chi-square asymptotic assumption for the likelihood ratio test statistic (G2) generally does not hold since latent class models with different number of classes are not nested each other (Collins and Lanza, 2013). Instead, Akaike information criterion (AIC) and Bayesian information criterion (BIC) are popular criteria to assess the relative model fit among candidate models with different number of classes. The model with smaller AIC (or BIC) is preferred, but interpretability of the latent structure also should be considered when choosing the appropriate number of classes.

It is also important to examine the absolute model fit by calculating the difference between expected and the observed frequencies. Jeong and Lee (2009) suggested the parametric bootstrap testing procedure to obtain the asymptotic distribution of test statistics proposed as a goodness-of-fit statistic for the cumulative logit model. Chung et al. (2011) used parametric bootstrap p-value that may be obtained from the empirical distribution of G2. This empirical distribution is generated by the parametric bootstrap method, and the bootstrap p-value can be computed from the proportion of the empirical G2s larger than the observed G2. Using the maximum-likelihood (ML) parameter estimates from the data, the empirical distribution of G2 can be constructed as: (a) fit the LCA with latent group variables to the data set and calculate the observed G2; (b) generate a bootstrap data set with the ML estimates; (c) fit the LCA with latent group variables to the bootstrap data set generated at step (b), and compute the G2. By independently repeating the procedures (b) and (c), the bootstrap samples of G2 are produced, and the right tail probability of the observed G2 from step (a) may be regarded as a bootstrap p-value.

Once the number of latent classes for each latent variable is determined, the number of group latent classes (i.e., joint latent classes) may be selected in similar criteria such as smaller AIC (or BIC) and bootstrap p-value. The covariates, then, can be considered with the selected number of latent classes.

4. Simulation study

In this section, we performed two sets of simulation studies to check how fairly the DAEM method works. The first study confirms that the DAEM method is superior to the EM in finding ML solutions at the global maximum. The second study evaluates how properly the DAEM estimates model parameters in LCA with the latent group variable with covariates. We construct confidence intervals based on asymptotic standard errors from Hessian matrix given in Appendix B.

In the first study, we generated one target dataset whose number of observation is 500, and we randomly generated 30 sets of initial values. With these starting values, we independently performed parameter estimation using (a) the conventional EM method and (b) the DAEM method with ω = [0.01, 0.1, 0.2, 0.4, 0.61, 0.64, 0.69, 0.71, 0.83, 0.91], and compare the values of log-likelihood from each of 30 sets of initial values. The left plot in Figure 2 is the histogram of log-likelihood values from the results by the EM algorithm using 30 sets of initial values. We can notice that three different values appeared, representing that the log-likelihood has two local maxima. Among the 30 trials, only 11 starting values succeeded in converging to the global maximum, and the other 19 trials are trapped in the local maxima. In DAEM, however, all 30 trials successfully provided the global maxima whose log-likelihood was −4162.916, implying the DAEM is not affected by the initial value choice when finding the global maximum.

The second study examined if DAEM properly operates to provide ML estimates of the proposed LCA model. We generated 200 data sets with a sample size of 500 and calculated ML estimates using the DAEM. The calculated parameter estimates and the standard errors from the Hessian matrix for the one sample were then used to construct a 95% confidence interval that checked if it covered the true value of the parameter or not. These procedures were independently repeated for 200 generated data sets and the coverage of the confidence intervals were subsequently calculated.

The structure of the generated data set is as follows. There are two latent variables which have two latent classes measured by four manifest items, respectively. These two latent variables form a group latent variable whose number of joint class is two. There is also one outcome latent variable which has two latent classes measured by four manifest items. For the measurement parameters, both of strong measurement (Table 1) and mixed measurement (Table 2) were considered. The average estimates from the DAEM are considerably similar with the true values, and the coverage probabilities of the 95% confidence intervals are fairly near the 0.95 both in strong and mixed measurements. This implies that the parameter estimation and model identification work properly.

5. Application to youth survey data

5.1. Data description

The Youth Risk Behavior Surveillance System 2015 (YRBSS 2015) is a biennial survey research about the health risk behavior and drug-using behavior among US adolescents. Among the 15,624 survey participants in the data, we focus on 4,957 high-school male students 16 to 18 years of age.

In this paper, we have 18 self-report items to measure violent and drug-using behavior. Five items were used to measure violent behavior: (1) During the past 30 days, on how many days did you carry a weapon such as a gun, knife, or club? (2) During the past 30 days, on how many days did you not go to school because you felt you would be unsafe at school or on your way to or from school? (3) During the past 12 months, how many times has someone threatened or injured you with a weapon such as a gun, knife, or club on school property? (4) During the past 12 months, how many times were you in a physical fight? (5) During the past 12 months, how many times were you in a physical fight in which you were injured and had to be treated by a doctor or nurse? Cigarette smoking was measured by four manifest items: (1) Have you ever tried cigarette smoking, even one or two puffs? (2) How old were you when you smoked a whole cigarette for the first time? (3) During the past 30 days, on how many days did you smoke cigarettes? (4) During the past 30 days, on the days you smoked, how many during the past 12 months, did you ever try to quit smoking cigarettes did you smoke per day? Four items on the alcohol consumption are as follows: (1) During your life, on how many days have you had at least one drink of alcohol? (2) How old were you when you had your first drink of alcohol other than a few sips? (3) During the past 30 days, on how many days did you have at least one drink of alcohol? (4) During the past 30 days, on how many days did you have 5 or more drinks of alcohol in a row, that is, within a couple of hours? Finally, five items on the other illegal drug-using behavior were: (1) During your life, how many times have you used marijuana? (2) How old were you when you tried marijuana for the first time? (3) During the past 30 days, how many times did you use marijuana? (4) Have you ever tried one of these illegal drugs: cocaine, sniffed solvents, heroin, methamphetamines, or ecstasy? (5) During the past 12 months, has anyone offered, sold, or given you an illegal drug on school property?

Among the original 18 manifest items, the quantitative variables are changed into binomial items: the variables about number of days (or times) are categorized as 1 if one day or more (or one time or more), and 0 otherwise, changing the responses into binary patterns (i.e., whether an individual has experience in something or not). The items on the age of first use are categorized as 1 under 13 years old, and 0 otherwise, indicating whether or not early exposure. Table 3 shows the proportions of responding ‘yes’ to the manifest items and the missing rates.

Using these 18 items along with their age as a covariate, we inspect the effect of drug-using behavior towards violent behavior, by answering following questions: (a) What kinds of latent classes may be found for each drug use and violent behavior? (b) What kinds of common joint patterns can be identified for cigarette, alcohol, and other illegal drug use behavior? (c) How does the prevalence of violent behavior change as the joint latent class membership of drug use varies?

5.2. Model selection

To construct the LCA-MLG model we need to determine the number of latent classes for each latent variable. Firstly, we perform four LCAs to select the number of classes for the respective latent variables (i.e., violent behavior, cigarette smoking, alcohol consumption, and other illegal drug use) based on each set of manifest items. In this step, covariates are not necessary to be included due to the marginalization property (Bandeen-Roche et al., 1997). We adopt AIC, BIC, and bootstrap p-value with significance level α = 0.05 for the goodness-of-fit statistic: the model with smaller AIC (or BIC) and bootstrap p-value larger than 0.05 were preferred. We also assess the interpretability of identified latent classes. For the bootstrap p-value, we generate 100 bootstrap samples of G2-statistic, and compute the proportion of the bootstrap samples of G2 that are larger than the observed G2.

Table 4 shows the goodness-of-fit statistics with the different number of classes for each latent variable. Note that only 2- and 3-class models are allowed to be fitted for Cigarette smoking and Alcohol consumption because 4-class model has 4 × 4 + (4 − 1) = 19 parameters to be estimated with four binary items having 24 − 1 = 15 degrees of freedom only. Table 4 shows that all drug-using behavior (i.e., Cigarette smoking, Alcohol consumption, and Other illegal drug use) can be adequately summarized by the 3-class model. For Violent behavior, the 3-class model shows the best fit in terms of BIC; however, the 4-class model shows the best in terms of AIC. In the 4-class model, however, the meaning of the identified classes are not distinct: the ρ-parameter estimates of the two classes are very similar, implying that these two classes should have a similar label. Both 3-class and 4-class models should therefore be considered as candidates for constructing the LCA-MLG model.

Given the number of latent classes for each of drug-using behavior, we determine the number of joint latent classes in the LCA-MLG model. Table 5 lists a series of LCA-MLG models fitted with various number of joint latent classes for drug-using behavior and their goodness of fit measures. The 3-class model and the 4-class model were both considered since the number of latent classes for Violent behavior were inconclusive in the previous step. Table 5 shows that the LCA-MLG model with four outcome classes and three or four joint classes can be considered as the model with the best fit. However, we decide to select the model with three outcome classes because the interpretational difference between the two outcome classes are still obscure. Among the models with three outcome classes, the LCA-MLG with three joint classes shows the smallest BIC and the model with four joint classes shows the smallest AIC. We select the LCA-MLG with three outcome classes and three joint classes as a final model because both show an acceptable bootstrap p-value. During these procedures, parameter estimation was conducted using the DAEM method, and the Hessian matrix of the model was investigated to confirm that negative Hessian is positive definite. Any boundary solution with smaller than 0.001 that makes the Hessian singular was constrained as 0 (or 1 if it is larger than 0.999). Finally, age was considered as a covariate with the selected LCA-MLG illustrated in Figure 3.

5.3. Parameter estimates for multiple latent group variables

The LCA-MLG model considered drug-using behavior as a group variable that may affect the outcome latent variable of Violent behavior. There were three latent variables that measured drug-using behavior: Cigarette smoking, Alcohol consumption, and Other illegal drug use.

Under the selected model structure, the primary measurement parameter estimates for drug-using behavior (i.e., ρ-parameters); the prevalences of the latent classes are given in Table 6. For Cigarette smoking, class 1 indicates ‘non-smoker’ group, having all ρ-parameters with very small value. Class 2 shows high probability for Lifetime Smoking only, so it is labeled as ‘lifetime smoker’ group. Class 3 shows the high probability for Lifetime Smoking, Early CIG Onset, and Recent Smoking, implying that this group can be named as ‘early onset current smoker.’ For Alcohol consumption, class 1 shows small values for all items, so it can be named as ‘non-drinker’ group. Class 2 shows high tendency of responding ‘yes’ to the Lifetime Drinking item only, so it may be called ‘lifetime drinker’ group. High probabilities for all items except for Early ALC Onset in class 3 implies that this group can be represented as ‘binge drinker’ group. In the similar manner, three classes of Other illegal drug use can be named as ‘non-user,’ ‘current marijuana user,’ and ‘early and multiple drug user,’ respectively. The ρ-parameters in the first class are all close to zero, and second latent class has high probability for Lifetime MJ Use and Recent MJ Use. The third latent class has high ρ-parameters for Lifetime MJ Use, Recent MJ Use, Lifetime Illegal Drug Use and Drugs Offered at School, showing an early onset behavior on marijuana, and lifetime-in school exposure of illegal drugs.

The estimated class prevalence for drug-using behavior may be estimated as

P^(Cj=cj)=u=1SP^(Cj=cjU=u)P^(U=u)=u=1Sη^cju(j)γ^u,

for cj = 1, . . . , Kj and j = 1, . . . , J. For example, Table 6 shows that the largest class in Alcohol drinking is non-drinker with prevalence 0.378. Three latent variables (i.e., Cigarette smoking, Alcohol drinking and Other illegal drug use), identified by their respective manifest items, form a joint latent variable which then becomes a group variable.

Table 7 shows the secondary measurement parameters (i.e., η-parameters) for measuring common patterns of three latent variables associated with drug-using behavior. As introduced in Section 2, η-parameter represents the conditional probability of being in the specific class for each of the latent variables given a joint class membership. In the first joint class, individuals tend to be ‘non-smoker’ for Cigarette smoking, ‘non-drinker’ for Alcohol consumption, and ‘non-user’ for Other illegal drug use. This implies that the first joint class represents the subgroup that does not have tendency to use any drug and can therefore be named as ‘non drug user.’ In the second joint class, individuals shows 59.8% probability of ‘lifetime smoker,’ 59.7% probability of ‘lifetime drinker,’ and 72% probability of ‘current marijuana user.’

As a result, the second joint latent class may be interpreted as ‘marijuana user with cigarette and alcohol onset’ group. For the third joint latent class, individuals show probabilities of 0.580 for ‘early onset current smoker,’ 0.907 for ‘binge drinker,’ and 0.953 for ‘early and multiple drug user.’ Therefore, the third joint latent class can be labeled as ‘multiple drug user’ group. The last row in Table 7 indicates γ estimates, representing the prevalence of the joint latent classes. The most common joint class is ‘not drug user’ group with a prevalence rate of 0.443, while ‘multiple drug user’ has the smallest prevalence rate 0.145.

5.4. Parameter estimates for outcome latent variable

Under the selected LCA-MLG model, the outcome latent variable is Violent behavior. The primary measurement parameter estimates for the outcome latent variable (i.e., φ-parameters) and the prevalence of the identified latent class is given in Table 8. We can see that class 1 has small φ-estimates for all five binary items, implying ‘not violent’ group. Class 2 has high probabilities for Carry Weapon and Physical Fight items, so it may be named as ‘weapon carry and fight’ group. Class 3, which has high probabilities for Carry Weapon, Being Threatened and Physical Fight, may be labelled as ‘seriously violent’ group. The estimated prevalences of latent classes for Violent behavior are obtained by

P^(W=w)=1ni=1nu=1SP^(W=wU=u,xi)P^(U=u)=1ni=1nu=1Sδ^wu(xi)γ^u,

for w = 1, . . . , D. In Table 8, we can see that ‘not violent’ group is the most prevalent with 0.762. The second largest class is ‘weapon carry and fight’ group, whose class prevalence is 0.185. The prevalence of the ‘seriously violent’ group is 0.053.

We consider age as a covariate to investigate its influence on Violent behavior with an effect of drug-using behavior. Table 9 illustrates the estimated odds ratios of age for the identified classes of Violent behavior and their 95% confidence intervals. As introduced in Section 3, the asymptotic standard errors of the estimates is obtained as the square root of the diagonal terms of the negative Hessian matrix evaluated by the ML estimates. The effects of age on the prevalence of Violent behavior are significant only for ‘current drug user’ group.

The estimated prevalences of Violent behavior given a latent membership of group variable (i.e., drug-using behavior) are given in Table 10. Note that the values in Table 10 can be obtained by

P^(W=wU=u)=1ni=1nδ^wu(xi)=1ni=1nexp(xiβ^wu)d=1Dexp(xiβ^du).

Table 10 shows that the prevalences of Violent behavior are moving from ‘not violent’ to ‘seriously violent’ as drug-using behavior from ‘non drug user’ to ‘multiple drug user.’ We may conclude that as drug-using behavior become more serious, a risk of exposure to violent behavior increases.

6. Conclusion

This paper proposes a new latent variable model to examine the relationship between violent behavior and multiple drug-using behavior among high-school male students. The conventional LCA may be able to deal with a single latent variable. The newly proposed LCA-MLG with covariates can investigate joint effects of several latent variables on the prevalence of outcome latent variable. EM algorithm is widely adopted for the parameter estimation of the incomplete data, but it has several problems of local maxima in the likelihood. We adopted the DAEM algorithm (Ueda and Nakano, 1998) to overcome this problem and to provide with precise ML estimates of latent variable model where the latent structure is quite complex. In addition, the Hessian matrix of the model was calculated to provide asymptotic standard error of the estimates. The analysis of YRBSS 2015 indicates three representative subgroups that show similar patterns in drug-using behavior including cigarette, alcohol and other illegal drugs. These common patterns form a joint latent variable whose joint latent classes can be referred as ‘non drug user,’ ‘current drug user,’ and ‘multiple drug user’ group depending on the extent of experiences or behavior towards various drugs. Similarly, three common subgroups were discovered for the violent behavior of high-school students, as measured by five binary items. The LCA-MLG model enables us to examine how the prevalence of Violent behavior are affected with respect to group latent membership and covariate. We found that the probability of belonging to ‘weapon carry and fight’ and ‘seriously violent’ compared to the ‘not violent’ increases as the degree of drug-using behavior becomes serious. For individuals who are in ‘current drug user,’ the probabilities of belonging to the ‘weapon carry and fight’ or ‘seriously violent’ compared to the ‘not violent’ decrease as age increases.

The structure of LCA-MLG tries to explain the association between several latent group variables and one latent outcome variable. However, the associations covered with LCA-MLG model are not exact causalities. Consequently, examining causal inference between several latent variable may be a valuable further research topic. Consequently, we have made a DAEM routine for LCA-MLG written in R language (version 3.3.1) which is available on request.

Acknowledgement

This work was supported by a Korea University Grant (K1509141 to Hwan Chung) and Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2015R1D1A1A01056846 to Hwan Chung).

Figures
Fig. 1. A diagram of latent class analysis with multiple latent group variables.
Fig. 2. A histogram of the log-likelihoods by EM and DAEM. EM = expectation-maximization; DAEM = deterministic annealing expectation-maximization.
Fig. 3. A structure of the selected latent class analysis with the multiple latent groups model.
TABLES

Table 1

Average EST, MSE, and CP of 95% confidence intervals for parameter estimates (strong measurement)

ParameterTrueESTMSECPParameterTrueESTMSECP
ρ111|10.100.1010.00030.93γ10.500.4950.00200.93
ρ211|10.100.0980.00030.95η11(1)0.900.9030.00160.96
ρ311|10.100.1010.00030.95η12(1)0.100.1010.00180.94
ρ411|10.100.1000.00030.96η11(2)0.100.0980.00170.93
ρ112|10.900.9000.00030.93η12(2)0.900.9010.00180.93
ρ212|10.900.9010.00040.93β01|1−1.00−1.0550.08230.96
ρ312|10.900.9000.00040.94β11|11.001.0380.05110.93
ρ412|10.900.9000.00040.95β01|21.001.0230.06720.98
ρ111|20.900.9020.00040.95β11|2−1.00−1.0320.04880.96
ρ211|20.900.9000.00030.93φ11|10.100.0990.00040.97
ρ311|20.900.9000.00030.93φ21|10.100.1010.00040.95
ρ411|20.900.8990.00040.94φ31|10.100.0990.00040.93
ρ112|20.100.1000.00030.97φ41|10.100.1010.00040.93
ρ212|20.100.0980.00030.94φ11|20.900.8980.00030.91
ρ312|20.100.0990.00040.95φ21|20.900.8990.00040.95
ρ412|20.100.1010.00030.95φ31|20.900.8990.00040.94
φ41|20.900.9020.00030.94

EST = estimates; MSE = mean square error; CP = coverage probability.


Table 2

Average EST, MSE, and CP of 95% confidence intervals for parameter estimates (mixed measurement)

ParameterTrueESTMSECPParameterTrueESTMSECP
ρ111|10.100.0970.00110.91γ10.500.5130.01330.97
ρ211|10.100.0980.00110.95η11(1)0.700.7000.00740.96
ρ311|10.300.2990.00090.98η12(1)0.200.1870.00660.97
ρ411|10.300.2980.00110.95η11(2)0.300.2980.00760.98
ρ112|10.900.9010.00080.96η12(2)0.800.8130.00740.96
ρ212|10.900.9020.00060.97β01|1−1.00−1.2420.77710.98
ρ312|10.700.7000.00080.98β11|11.001.2120.72280.98
ρ412|10.700.6930.00080.95β01|21.001.3310.63970.98
ρ111|20.900.8990.00050.98β11|2−1.00−1.3490.65340.99
ρ211|20.900.8980.00060.97φ11|10.100.0980.00150.92
ρ311|20.700.7090.00060.98φ21|10.100.0870.00160.95
ρ411|20.700.7020.00090.93φ31|10.300.3100.00070.97
ρ112|20.100.1030.00070.95φ41|10.300.3020.00090.95
ρ212|20.100.0960.00080.98φ11|20.700.6910.00170.93
ρ312|20.300.2990.00090.98φ21|20.700.7010.00070.93
ρ412|20.300.3000.00150.96φ31|20.100.1030.00090.95
φ41|20.100.1020.00170.96

EST = estimates; MSE = mean square error; CP = coverage probability.


Table 3

Percentages of responding ‘yes’ to the manifest items for each latent variable and their missing rates

Latent variableManifest item
Questionnaires on “Have you...?”
Yes (%)Missing (%)
Violent behaviorCarried a weapon during recent 30 days (Carry Weapon)24.268.21
Absent to school due to feeling unsafe recent 12 months (Feeling Unsafe)5.140.32
Threatened by weapon on school recent 12 months (Being Threatened)7.103.87
Involved in a physical fight recent 12 months (Physical Fight)22.4713.71
Seriously injured in a physical fight recent 12 months (Seriously Injured)3.3112.16

Cigarette smokingEver tried cigarette smoking (Lifetime Smoking)36.359.96
Smoked before age 13 years for the first time (Early CIG Onset)8.735.85
Smoked cigarettes during the recent 30 days (Recent Smoking)13.335.18
Smoked more than 10 cigarettes per day (Heavy Smoking)1.455.38

Alcohol consumptionEver drunken alcohol (Lifetime Drinking)64.953.18
Drunken alcohol before age 13 years for the first time (Early ALC Onset)19.422.78
Drunken alcohol during the recent 30 days (Recent Drinking)33.1016.76
Drunken five or more drinks of alcohol in a row (Binge Drinking)21.445.26

Other illegal drug useEver tried MJ (Lifetime MJ Use)45.874.17
Tried MJ before age 13 years for the first time (Early MJ Onset)10.652.54
Used MJ during the recent 30 days (Recent MJ Use)26.6210.53
Ever tried other illegal drugs (Lifetime Illegal Drug Use)15.295.41
Ever sold or offered illegal drug on school property (Sell Drug on School)25.473.77

MJ = marijuana.


Table 4

Goodness-of-fit measures for a series of latent class analysis models with the different number of classes for four different class variables

Latent variableNumber of classesAICBICBootstrap p-value
Violent behavior214779.814851.40.00
314687.014797.60.00
414652.014801.70.34
514663.714852.40.49

Cigarette smoking211195.111253.60.00
311058.811150.00.34

Alcohol consumption216826.816885.40.00
316607.916699.00.41

Other illegal drug use220381.320452.90.00
320275.320386.00.28
420279.720429.40.45
520291.220480.00.68

AIC = Akaike information criterion; BIC = Bayesian information criterion.


Table 5

Goodness-of-fit measures for a series of LCA-MLG models with the different number of joint classes of drug-using behavior (i.e., Cigarette smoking, Alcohol consumption, and Other illegal drug use) and outcome latent classes of Violent behavior

Number of classes for Violent behaviorNumber of joint classes for drug-using behaviorAICBICBootstrap p-value
3278725.879187.90.01
378590.179110.80.06
478555.879135.00.24
578547.779185.50.22
678611.079307.40.33

4278661.579169.20.04
377987.478560.10.16
477911.278614.10.16
577979.778617.50.32
6163880.7164648.70.34

LCA-MLG = latent class analysis with multiple latent groups; AIC = Akaike information criterion; BIC = Bayesian information criterion.


Table 6

The estimated probabilities of responding ‘yes’ to the manifest items and class prevalences for each of the latent variables

Manifest itemLatent class for Cigarette smoking

Non-smokerLifetime smokerEarly onset current smoker
Lifetime Smoking0.0471.000*1.000*
Early CIG Onset0.000*0.1580.535
Recent Smoking0.000*0.2191.000*
Heavy Smoking0.000*0.000*0.200

Class prevalence0.6140.2980.088

Manifest itemLatent class for Alcohol consumption

Non-drinkerLifetime drinkerBinge drinker

Lifetime Drinking0.1601.000*1.000*
Early ALC Onset0.0360.2330.382
Recent Drinking0.000*0.3061.000*
Binge Drinking0.000*0.000*0.886

Class prevalence0.3780.3520.270

Manifest itemLatent class for Other illegal drug use

Non-userCurrent marijuana userEarly and multiple drug user

Lifetime MJ Use0.0381.000*1.000*
Early MJ Onset0.000*0.1240.497
Recent MJ Use0.000*0.4970.835
Lifetime Illegal Drug Use0.0320.1990.803
Drugs Offered at School0.1650.3060.577

Class prevalence0.5390.3140.147

MJ = marijuana.

*The probabilities are constrained to be zero or one.


Table 7

The estimated probabilities of belonging to a latent class for a given joint class membership and the prevalence of joint classes

Latent classJoint latent class for drug-using behavior

Non drug userCurrent drug userMultiple drug user
Cigarette smokingNon-smoker0.9880.4010.079
Lifetime smoker0.0050.5980.341
Early onset current smoker0.0070.0010.580

Alcohol consumptionNon-drinker0.7600.0890.034
Lifetime drinker0.2200.5970.059
Binge drinker0.0200.3140.907

Other illegal drug useNon-user0.9640.2590.038
Current marijuana user0.0360.7200.009
Early and multiple drug user0.000*0.0210.953

Joint class prevalence0.4430.4120.145

*The probabilities are constrained to be zero or one.


Table 8

The estimated probabilities of responding ‘yes’ to the manifest items and class prevalences for each of latent variables

Manifest itemLatent class for Violent behavior

Not violentWeapon carry and fightSeriously violent
Carry Weapon0.1770.5010.797
Feeling Unsafe0.0180.0660.494
Being Threatened0.0210.1130.724
Physical Fight0.0571.000*0.827
Seriously Injured0.000*0.1170.364

Class prevalence0.7620.1850.053

*The probabilities are constrained to be zero or one.


Table 9

The estimated odds ratios of age for Violent behavior for a given joint class membership and 95% confidence intervals (‘not-violent’ group is the baseline)

Joint latent classLatent class for Violent behavior

Weapon carry and fightSeriously violent
Non drug user0.723 [0.472, 1.106]NA*
Current drug user0.690 [0.558, 0.852]0.815 [0.325, 2.041]
Multiple drug user0.892 [0.640, 1.241]0.857 [0.526, 1.396]

*The probability of belonging to ‘seriously violent’ for ‘non drug user’ group is constrained to be zero.


Table 10

The estimated prevalence of Violent behavior given a latent membership of group variable (i.e., drug-using behavior)

Group variableLatent class for Violent behavior

Not violentWeapon carry and fightSeriously violent
Non drug user0.9430.0570.000*
Current drug user0.7420.2310.027
Multiple drug user0.2670.4500.283

*The probability of belonging to ‘seriously violent’ for ‘non drug user’ group is constrained to be zero.


References
  1. Bandeen-Roche, K, Miglioretti, DL, Zeger, SL, and Rathouz, PJ (1997). Latent variable regression for multiple discrete outcomes. Journal of the American Statistical Association. 92, 1375-1386.
    CrossRef
  2. Centers for Disease Control and Prevention (2015). Youth Risk Behavior Surveillance System.Available at http://www.cdc.gov/YRBSS
  3. Chang, HC, and Chung, H (2013). Dealing with multiple local modalities in latent class profile analysis. Computational Statistics and Data Analysis. 68, 296-310.
    CrossRef
  4. Chung, H, Anthony, JC, and Schafer, JL (2011). Latent class profile analysis: an application to stage sequential processes in early onset drinking behaviours. Journal of the Royal Statistical Society Series A (Statistics in Society). 174, 689-712.
    CrossRef
  5. Collins, LM, and Lanza, ST (2013). Latent Class and Latent Transition Analysis: With Applications in the Social, Behavioral, and Health Sciences. Hoboken, NY: John Wiley & Sons
  6. Dawkins, MP (1997). Drug use and violent crime among adolescents. Adolescence. 32, 395-405.
    Pubmed
  7. DuRant, RH, Altman, D, Wolfson, M, Barkin, S, Kreiter, S, and Krowchuk, D (2000). Exposure to violence and victimization, depression, substance use, and the use of violence by young adolescents. The Journal of Pediatrics. 137, 707-713.
    Pubmed CrossRef
  8. Jeong, KM, and Lee, HY (2009). Goodness-of-fit tests for the ordinal response models with misspecified links. Communications for Statistical Applications and Methods. 16, 697-705.
    CrossRef
  9. Lowry, R, Cohen, LR, Modzeleski, W, Kann, L, Collins, JL, and Kolbe, LJ (1999). School violence, substance use, and availability of illegal drugs on school property among US high school students. Journal of School Health. 69, 347-355.
    CrossRef
  10. Lundholm, L 2013. Substance use and violence: influence of alcohol, illicit drugs and anabolic androgenic steroids on violent crime and self-directed violence. Doctoral dissertation. Acta Universitatis Upsaliensis. Uppsala.
  11. Miller, CL, Kerr, T, Strathdee, SA, Li, K, and Wood, E (2007). Factors associated with premature mortality among young injection drug users in Vancouver. Harm Reduction Journal. 4, 1-7.
    Pubmed KoreaMed CrossRef
  12. Ueda, N, and Nakano, R (1998). Deterministic annealing EM algorithm. Neural Networks. 11, 271-282.
    CrossRef
  13. Van Horn, ML, Fagan, AA, Hawkins, JD, and Oesterle, S (2014). Effects of the communities that care system on cross-sectional profiles of adolescent substance use and delinquency. American Journal of Preventive Medicine. 47, 188-197.
    Pubmed KoreaMed CrossRef
  14. White, HR, Loeber, R, Stouthamer-Loeber, M, and Farrington, DP (1999). Developmental associations between substance use and violence. Development and Psychopathology. 11, 785-803.
    CrossRef