Generalized canonical correlation analysis (GCCA) extends the canonical correlation analysis (CCA) to the case of more than two sets of variables and there have been many studies on how two-set canonical solutions can be generalized. In this paper, we derive certain stationary equations which can lead the higher-order solutions of several GCCA methods and suggest a type of iterative procedure to obtain the canonical coefficients. In addition, with some numerical examples we present the methods for graphical display, which are useful to interpret the GCCA results obtained.
Generalized canonical correlation analysis (GCCA) involves comparing
Suppose we have
Criteria, the functions to be optimized, for obtaining the canonical coefficient vectors
Some of the above have also been suggested by other studies (Horst, 1961a, 1961b, 1965; Steel, 1951) and Gifi (1990) mentioned other methods MINSUM (minimize the sum of the correlations) and PRINCALS (maximize the sum of the
The study of relations among the sets can be continued beyond the first order (stage) by considering higher order (stage) canonical variates
In this paper, we derive certain stationary equations that derive solutions for higher-order solutions of several GCCA methods and suggest a type of iterative procedure to obtain canonical coefficients. In addition, we present methods for graphical display with some numerical examples, which are useful to interpret GCCA results.
In order to construct a valid optimization problem, the criterion to be optimized is usually subjected to certain constraint. Let
Under this constraint,
Under the constraint (
where f(·) stands for a criterion to be considered, Kang and Kim (2006) derived the following form of unifying stationary equations for each of the six different GCCA solutions,
where
respectively, and
In general, imposing different constraints for canonical variates could mean deriving different solutions even for the same criterion. Another useful constraint (constant-sum-variances constraint), an alternative to the unit-variances constraint, is
where
There is a situation where the solutions with the constant-sum-variances constraint could be very unfair (Van de Geer, 1984). The canonical variates with this constraint have not identical variances in general and the criteria could be heavily dependent on canonical variates with relatively large or small variances. However, the constant-sum-variances constraint would sometimes be preferred due to relatively easy computations. Moreover, for some criteria, since the stationary equations with constant-sum-variances constraint have an explicit form (Kang and Kim, 2006), considering this constraint is helpful to clarify the characteristics of the criteria.
The constant-sum-variances constraint needs requires one Lagrange multiplier
where
Further discussions for GCCA solutions through the stationary
In this section, we derive certain stationary equations which can lead the higher-order GCCA solutions.
For higher-order GCCA solutions, we need to place further restrictions on the additional canonical variates in order to achieve mutual orthogonality and compute them successively.
Suppose
where the
In the context of
which means the canonical variates obtained at each successive order are required to be uncorrelated to each other within the same set. In this case,
which is useful when joined with constant-sum-variances constraint. With this restriction, solutions of every order for MAXVAR and MINVAR can be obtained simultaneously by performing a single eigen-analysis on
The two restrictions mentioned above yield the same canonical variates when
We now derive stationary equations for solutions with any form of (
where
The
Step 1. Obtain the initial values
Step 2. At the
calculate
all weights of
Step 3. Repeat Step 2 until the convergence condition is satisfied.
The weight vectors
where
In this section, a numerical example is constructed to illustrate contrast among the six different GCCA methods. This example has three sets of variables (
Table 2 displays the first and the second-order solutions of six GCCA methods, which is obtained by above iterative procedure: the canonical weight vectors (
Table 2 indicates that MINVAR shows very different results of canonical weights and canonical correlation coefficients. The canonical weights of SUMCOR, MAXVAR, and SSQCOR are indistinguishable on the whole and provide almost the same level of canonical correlations; consequently, GENVAR appears similar to those of SUMCOR, MAXVAR, and SSQCOR. However, MAXECC gives intermediate results that seem noticeably different from SUMCOR, MAXVAR, and SSQCOR.
In this section, we present the methods for graphical display which are useful to interpret the GCCA results.
Let
and
where
Park and Huh (1996) suggest an
Table 3 presents the average correlation coefficients of the
The canonical loadings, the correlations between the original variables and their canonical variables, can be obtained by
and thus the
which satisfies
The canonical loading plot is obtained by plotting
Table 4 shows the first and the second-order canonical loadings of six GCCA methods; in addition, Table 5 presents the explained variances by the
Correlation matrix
Sets of variables | I | II | III | |||||||
---|---|---|---|---|---|---|---|---|---|---|
I | 1.00 | 0.25 | 0.27 | 0.44 | 0.18 | 0.19 | 0.43 | 0.37 | 0.28 | |
0.25 | 1.00 | 0.40 | 0.14 | 0.65 | 0.26 | 0.19 | 0.53 | 0.36 | ||
0.27 | 0.40 | 1.00 | 0.18 | 0.41 | 0.61 | 0.23 | 0.47 | 0.61 | ||
II | 0.44 | 0.14 | 0.18 | 1.00 | 0.09 | 0.15 | 0.85 | 0.25 | 0.19 | |
0.18 | 0.65 | 0.41 | 0.09 | 1.00 | 0.30 | 0.10 | 0.54 | 0.39 | ||
0.19 | 0.26 | 0.61 | 0.15 | 0.30 | 1.00 | 0.18 | 0.44 | 0.50 | ||
III | 0.43 | 0.19 | 0.23 | 0.85 | 0.10 | 0.18 | 1.00 | 0.29 | 0.25 | |
0.37 | 0.53 | 0.47 | 0.25 | 0.54 | 0.44 | 0.29 | 1.00 | 0.43 | ||
0.28 | 0.36 | 0.61 | 0.19 | 0.39 | 0.50 | 0.25 | 0.43 | 1.00 |
The first and the second-order solutions of GCCA
Sets | SUMCOR | MAXVAR | SSQCOR | MAXECC | GENVAR | MINVAR | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
0.318 | 1.002 | 0.319 | 1.001 | 0.323 | 0.999 | 0.745 | 0.731 | 0.391 | 0.967 | −0.026 | 0.995 | |
0.426 | −0.360 | 0.425 | −0.368 | 0.424 | −0.388 | 0.221 | −0.638 | 0.393 | −0.486 | 0.728 | −0.416 | |
0.590 | −0.376 | 0.589 | −0.370 | 0.588 | −0.355 | 0.341 | −0.485 | 0.565 | −0.328 | 0.464 | 0.131 | |
0.414 | 0.920 | 0.417 | 0.919 | 0.422 | 0.916 | 0.948 | 0.341 | 0.536 | 0.853 | 0.990 | 0.178 | |
0.558 | −0.384 | 0.557 | −0.385 | 0.554 | −0.392 | 0.111 | −0.696 | 0.494 | −0.468 | −0.024 | −0.675 | |
0.497 | −0.289 | 0.497 | −0.290 | 0.495 | −0.290 | 0.143 | −0.526 | 0.460 | −0.342 | 0.067 | −0.576 | |
0.299 | 1.008 | 0.301 | 1.009 | 0.306 | 1.009 | 0.917 | 0.524 | 0.415 | 0.971 | 0.948 | 0.464 | |
0.557 | −0.325 | 0.556 | −0.343 | 0.555 | −0.364 | 0.153 | −0.714 | 0.515 | −0.457 | 0.105 | −0.693 | |
0.462 | −0.395 | 0.461 | −0.380 | 0.459 | −0.362 | 0.081 | −0.505 | 0.413 | −0.375 | 0.052 | −0.543 | |
0.712 | 0.415 | 0.712 | 0.416 | 0.711 | 0.417 | 0.503 | 0.555 | 0.691 | 0.433 | 0.188 | 0.005 | |
0.729 | 0.308 | 0.729 | 0.307 | 0.729 | 0.306 | 0.529 | 0.416 | 0.719 | 0.305 | 0.312 | −0.082 | |
0.748 | 0.778 | 0.748 | 0.779 | 0.749 | 0.778 | 0.851 | 0.675 | 0.771 | 0.757 | 0.846 | 0.669 | |
2.459 | 2.033 | 2.459 | 2.033 | 2.459 | 2.032 | 2.270 | 2.104 | 2.454 | 2.024 | 1.975 | 1.673 | |
0.250 | 0.212 | 0.249 | 0.212 | 0.248 | 0.212 | 0.148 | 0.298 | 0.226 | 0.229 | 0.144 | 0.326 |
GCCA = generalized canonical correlation analysis;
Average correlations of canonical variates and GAI
SUMCOR | MAXECC | MINVAR | ||||
---|---|---|---|---|---|---|
GAI | GAI | GAI | ||||
1 | 0.730 | 61.1% | 0.628 | 50.1% | 0.449 | 62.0% |
2 | 0.500 | 89.8% | 0.549 | 88.4% | 0.197 | 74.0% |
3 | 0.299 | 100% | 0.302 | 100% | 0.291 | 100% |
GAI = goodness-of-approximation indices;
The first and the second-order canonical loadings
SUMCOR | MAXVAR | SSQCOR | MAXECC | GENVAR | MINVAR | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
0.584 | 0.811 | 0.585 | 0.810 | 0.587 | 0.806 | 0.892 | 0.441 | 0.642 | 0.756 | 0.281 | 0.926 | |
0.741 | −0.260 | 0.741 | −0.266 | 0.740 | −0.280 | 0.543 | −0.649 | 0.717 | −0.376 | 0.907 | −0.115 | |
0.846 | −0.249 | 0.846 | −0.247 | 0.845 | −0.241 | 0.631 | −0.543 | 0.827 | −0.262 | 0.748 | 0.233 | |
0.539 | 0.842 | 0.541 | 0.840 | 0.546 | 0.837 | 0.979 | 0.199 | 0.649 | 0.760 | 0.998 | 0.031 | |
0.745 | −0.388 | 0.743 | −0.390 | 0.741 | −0.396 | 0.239 | −0.823 | 0.680 | −0.494 | 0.085 | −0.832 | |
0.727 | −0.266 | 0.726 | −0.268 | 0.725 | −0.270 | 0.318 | −0.684 | 0.688 | −0.355 | 0.209 | −0.752 | |
0.576 | 0.815 | 0.578 | 0.815 | 0.582 | 0.813 | 0.982 | 0.190 | 0.668 | 0.744 | 0.992 | 0.127 | |
0.842 | −0.203 | 0.842 | −0.213 | 0.841 | −0.227 | 0.454 | −0.779 | 0.813 | −0.337 | 0.402 | −0.791 | |
0.776 | −0.283 | 0.776 | −0.275 | 0.774 | −0.267 | 0.376 | −0.681 | 0.738 | −0.329 | 0.334 | −0.724 |
Explained-variance indices
Sets | SUMCOR | MAXECC | MINVAR | ||||
---|---|---|---|---|---|---|---|
EV | EVI | EV | EVI | EV | EVI | ||
I | 1 | 0.535 | 53.5% | 0.496 | 49.6% | 0.487 | 48.7% |
2 | 0.262 | 79.8% | 0.304 | 80.0% | 0.308 | 79.6% | |
3 | 0.202 | 100% | 0.200 | 100% | 0.204 | 100% | |
II | 1 | 0.458 | 45.8% | 0.372 | 37.2% | 0.349 | 34.9% |
2 | 0.310 | 76.8% | 0.395 | 76.7% | 0.419 | 76.8% | |
3 | 0.232 | 100% | 0.233 | 100% | 0.232 | 100% | |
III | 1 | 0.548 | 54.8% | 0.437 | 43.7% | 0.419 | 41.9% |
2 | 0.262 | 81.0% | 0.369 | 80.6% | 0.389 | 80.8% | |
3 | 0.190 | 100% | 0.194 | 100% | 0.192 | 100% |