TEXT SIZE

• •   CrossRef (0) Higher-order solutions for generalized canonical correlation analysis  Hyuncheol Kang

aDivision of Big Data and Management Engineering, Hoseo University, Korea
Correspondence to: 1Division of Big Data and Management Engineering, Hoseo University, 20, Hoseo-ro 79beon-gil, Baebang-eup, Asan-si, Chungcheongnam-do 31499, Korea. E-mail: hychkang@hoseo.edu
Received February 27, 2019; Revised March 27, 2019; Accepted March 27, 2019.
Abstract

Generalized canonical correlation analysis (GCCA) extends the canonical correlation analysis (CCA) to the case of more than two sets of variables and there have been many studies on how two-set canonical solutions can be generalized. In this paper, we derive certain stationary equations which can lead the higher-order solutions of several GCCA methods and suggest a type of iterative procedure to obtain the canonical coefficients. In addition, with some numerical examples we present the methods for graphical display, which are useful to interpret the GCCA results obtained.

Keywords : generalized canonical correlation analysis, higher-order solutions, canonical weights, goodness of approximation indices, canonical loadings, explained variance indices
1. Introduction

Generalized canonical correlation analysis (GCCA) involves comparing m sets of variables after removing linear dependencies within each set that can extend the canonical correlation analysis (CCA) of Hotteling (1936) to cases of more than two sets of variables. There have been many studies on how two-set canonical solutions can be generalized (Horst, 1961a, 1961b, 1965; Carroll, 1968; Kettenring, 1971; Van de Geer, 1984; Ten Berge, 1988; Coppi and Bolasco, 1989; Gifi, 1990; Park and Huh, 1996), with discussions on the criteria used and constraints imposed.

Suppose we have m data matrices Xi (i = 1, . . . ,m), each from a sample of size n on pi variables. It is assumed implicitly that all the variables are standardized to have zero means and unit variances. The problem is to find linear composites zi = Xiai for each set in such a way that the matrix Z = (z1 · · · zi · · · zm) = (X1a1 · · ·Xiai · · ·Xmam) optimizes a particular function of its covariance matrix, Φ = {ϕi j}, i, j = 1, . . . ,m.

Criteria, the functions to be optimized, for obtaining the canonical coefficient vectors ai (i = 1, . . . ,m) are characterized by the chosen function of Φ. Each object function corresponding to a criterion is defined in terms of Φ. Let l1l2 ≥ ·· · ≥ lm be the ordered eigenvalues of Φ. Kettenring (1971) considered the following five criteria for selecting Z: (i) $SUMCOR (maximize ∑i,jmϕij)$, (ii) MAXVAR (maximize l1), (iii) $SSQCOR (maximize ∑i,jmϕij2)$, (iv) MINVAR (minimize lm), (v) GENVAR (minimize det(Φ)). Kang and Kim (2006) added a criterion for the GCCA solution: (vi) MAXECC (maximize (l1lm)/(l1 + lm)).

Some of the above have also been suggested by other studies (Horst, 1961a, 1961b, 1965; Steel, 1951) and Gifi (1990) mentioned other methods MINSUM (minimize the sum of the correlations) and PRINCALS (maximize the sum of the q largest eigenvalues of Φ, or, equivalently, minimize the sum of the mq smallest ones). Gifi also suggested a nonlinear version of GCCA, called OVERALS based on the MAXVAR criterion that incorporates nonlinear transformations of the data.

The study of relations among the sets can be continued beyond the first order (stage) by considering higher order (stage) canonical variates Z(2),Z(3), . . . to supplement the optimal Z(1). The same criterion function is used at each order, but restrictions are added to assure that canonical variables for a particular order differ from the previous orders.

In this paper, we derive certain stationary equations that derive solutions for higher-order solutions of several GCCA methods and suggest a type of iterative procedure to obtain canonical coefficients. In addition, we present methods for graphical display with some numerical examples, which are useful to interpret GCCA results.

2. Unifying stationary equations for GCCA

In order to construct a valid optimization problem, the criterion to be optimized is usually subjected to certain constraint. Let R denote the correlation matrix of all original variables, and Ri j of size pi × pj the correlation matrix between the ith and the jth set of variables, which is a submatrix of R. A typical constraint (unit-variances constraint) is

$ai′Riiai=1, i=1,…,m.$

Under this constraint, Φ can be interpreted as a correlation matrix, which is advantage of considering the constraint.

Under the constraint (2.1), by differentiating the following object function g(·) with respect to ai,

$g (a1,…,am)=f (a1,…,am)-∑i=1mμi(ai′Riiai-1),$

where f(·) stands for a criterion to be considered, Kang and Kim (2006) derived the following form of unifying stationary equations for each of the six different GCCA solutions,

$∑j=1mwijRijaj=μi*Riiai, i=1,…,m,$

where

$wij={1m,for SUMCOR,ei1ej1,for MAXVAR,eimejm,for MINVAR,∑k=1meikejklk,for SSQCOR,∑k=1meikejk/lk,for GENVAR,ei1ej1lm-eimejml1,for MAXECC,$

respectively, and ei j is the ith elements of ej which is the unit-normed eigenvector associated with the jth eigenvalue lj of Φ. The stationary equation (2.3) implies that GCCA methods differ only by applying different weights as shown in (2.4) to obtain the desired solutions. Therefore it could be a way to characterize each GCCA method in terms of wi j.

In general, imposing different constraints for canonical variates could mean deriving different solutions even for the same criterion. Another useful constraint (constant-sum-variances constraint), an alternative to the unit-variances constraint, is

$∑i=1mai′Riiai=a′Da=m,$

where $a′=(a1′⋯am′)$ and D is a block-diagonal matrix with Rii as its ith block.

There is a situation where the solutions with the constant-sum-variances constraint could be very unfair (Van de Geer, 1984). The canonical variates with this constraint have not identical variances in general and the criteria could be heavily dependent on canonical variates with relatively large or small variances. However, the constant-sum-variances constraint would sometimes be preferred due to relatively easy computations. Moreover, for some criteria, since the stationary equations with constant-sum-variances constraint have an explicit form (Kang and Kim, 2006), considering this constraint is helpful to clarify the characteristics of the criteria.

The constant-sum-variances constraint needs requires one Lagrange multiplier μ; therefore, the equations to solve for the solutions can be written as the following matrix form of representation

$RDaW=μDDa,$

where Da denotes a block diagonal matrix with ai as its ith block and W is a matrix with wi j of (2.4) as its (i, j)th element.

Further discussions for GCCA solutions through the stationary equations (2.3) and (2.6) are presented in Kang and Kim (2006) with some results of numerical illustrations.

3. Higher-order solutions

In this section, we derive certain stationary equations which can lead the higher-order GCCA solutions.

### 3.1. Higher-order solutions and computing procedure

For higher-order GCCA solutions, we need to place further restrictions on the additional canonical variates in order to achieve mutual orthogonality and compute them successively.

Suppose Z(k) = (z1(k) · · · zi(k) · · · zm(k)) = (X1a1(k) · · ·Xiai(k) · · ·Xmam(k)) denotes the set of the kth order canonical variates for kq and q = min(p1, p2, . . . , pm). Then the problem, under the unit-variance constraint, reduces to optimizing the following object function g(·):

$g (a(k))=f (a(k))-∑i=1mμi (ai(k)′Riiai(k)-1)-2∑i=1mai(k)′Ci(k)γi,$

where the μi and γi are the Lagrange multipliers, f (a(k)) is the function chosen according to the criterion, and Ci(k) is a matrix that satisfies the following form of restrictions for the orthogonality

$ai(k)′Ci(k)=0, i=1,…,m.$

In the context of m-set analysis, following two restrictions are usually used (Gifi, 1990). A reasonable restriction on Z(k), the strong orthogonality restriction, could be

$corr (zi(k),zi(l))=ai(k)′Riiai(l)=0, l=1,…,k-1; i=1,…,m,$

which means the canonical variates obtained at each successive order are required to be uncorrelated to each other within the same set. In this case, Ci(k) = Rii (ai(1) · · · ai(k−1)). Another restriction, the weak orthogonality restriction, is

$∑i=1mcorr (zi(k),zi(l))=a(k)′Da(l)=0, l=1,…,k-1,$

which is useful when joined with constant-sum-variances constraint. With this restriction, solutions of every order for MAXVAR and MINVAR can be obtained simultaneously by performing a single eigen-analysis on D−1/2RD−1/2 which is a considerable advantage, both computationally and theoretically (Kang and Kim, 2006).

The two restrictions mentioned above yield the same canonical variates when m = 2 (Anderson, 1984; Kettenring, 1971).

We now derive stationary equations for solutions with any form of (3.2), under the unit-variance constraint. Taking the derivative of g(·) with respect to ai in (3.1) and setting it to zero, applying appendix of Kang and Kim (2006) and after some mathematical computations, yield the following system of stationary equations

$∑j=1mwijEiRijaj(k)=μiRiiai(k), i=1,…,m,$

where $Ei=I-Ci(k)(Ci(k)′Rii-1Ci(k))-Ci(k)′Rii-1$ and the weights wi j are the same as those given in (2.4).

The kth order solutions for GCCA methods can be obtained using a Gauss-Seidel type of iterative procedure. The following steps describe the iterative routine.

• Step 1. Obtain the initial values $ai0 (i=1,…,m)$.

• Step 2. At the tth iteration (t = 1, 2, . . .), evaluate the stationary equations provided in (3.5) by exploiting $a1t,…,ai-1t,ait-1,…,amt-1$ and calculate the tth updated values $ait$ in the due order of the subscript i = 1, . . . ,m. In this step, $ait$ is obtained from a vector proportional to $∑j=1mwijRii-1EiRijaj$ as follows.

• calculate Φ, the correlation matrix of canonical variates

• all weights of wi j ( j = 1, . . . ,m) are updated from Φ

• $ai←∑j=1mwijEiRijaj$

• $ai←ai/ai′ai$

• $ai←Rii-1/2ai$

• Step 3. Repeat Step 2 until the convergence condition is satisfied.

The weight vectors ai from above procedure can be called standardized canonical weights (coefficients), because those are obtained based on R, the correlation matrix of original variables. If data matrices Xi are not standardized, the raw canonical weights (coefficients) can be obtained by

$bi=Vi-12ai, i=1,…,m,$

where Vi is a diagonal matrix with the variances of original variables as its diagonal elements.

### 3.2. Numerical illustration

In this section, a numerical example is constructed to illustrate contrast among the six different GCCA methods. This example has three sets of variables (m = 3) and the correlation matrix with p1 = p2 = p3 = 3 is presented in Table 1. For several GCCA methods, it can be expected that the similarities and dissimilarities of the results rely on the extreme eigenvalues, especially the smallest eigenvalue, of D−1/2RD−1/2. For this example, the smallest eigenvalue of D−1/2RD−1/2 is 0.144 which is moderately small. Thus one can expect that the methods give very different results.

Table 2 displays the first and the second-order solutions of six GCCA methods, which is obtained by above iterative procedure: the canonical weight vectors (ai), the correlation coefficients of canonical variates (ϕi j), and the two extreme eigenvalues of Φ (l1 and l3).

Table 2 indicates that MINVAR shows very different results of canonical weights and canonical correlation coefficients. The canonical weights of SUMCOR, MAXVAR, and SSQCOR are indistinguishable on the whole and provide almost the same level of canonical correlations; consequently, GENVAR appears similar to those of SUMCOR, MAXVAR, and SSQCOR. However, MAXECC gives intermediate results that seem noticeably different from SUMCOR, MAXVAR, and SSQCOR.

4. Graphical display of GCCA results

In this section, we present the methods for graphical display which are useful to interpret the GCCA results.

### 4.1. Quantification plots with canonical weights

Let Ai denote the canonical weight matrix of the ith set of variables under the unit-variance constraint,

$Ai=(ai(1),…,ai(k),…,ai(q)), i=1,…,m,$

and $Dϕ-$ a diagonal matrix with the average correlations of canonical variates as its kth elements,

$Dϕ¯=diag (ϕ¯(1),…,ϕ¯(k),…,ϕ¯(q)),$

where ϕ̄(k) = ∑∑ij ϕi j(k)/[m(m − 1)], the average correlation coefficients of the kth order canonical variates.

Park and Huh (1996) suggest an r-dimensional quantification plot obtained by plotting $Ai(r)Dϕ-(r)$, the first r columns of $AiDϕ-$, for rq and q = min(p1, p2, . . . , pm). This plot is useful to investigate the structural relationships of variables. Now, as an extend of two sets of variables, we may define a ‘goodness-of-approximation indices (GAI)’ by

$GAI(r)=tr(Dϕ¯(r)2)tr(Dϕ¯2)=∑k=1rϕ¯(k)2∑k=1qϕ¯(k)2,$

Table 3 presents the average correlation coefficients of the kth order canonical variates (ϕ̄(k)) and GAI for selected three GCCA methods. Figure 1 displays the 2-dimentional quantification plots with canonical weights.

The canonical loadings, the correlations between the original variables and their canonical variables, can be obtained by

$Λi=RiiAi, i=1,…,m$

and thus the kth column of Λi is λi(k) = Riiai(k). The explained variance of the ith set of variables by their kth canonical variate is

$EVi(k)=1piλi(k)′λi(k), i=1,…,m; k=1,…,q,$

which satisfies $∑k=1qEVi(k)=1$.

The canonical loading plot is obtained by plotting Λi(r), the first r columns of Λi. This plot is useful to interpret the meaning of canonical variates. Now, as an extend of two sets of variables, we may define a ‘explained-variance indices (EVI)’ for ith set of variables by

$EVIi(r)=∑k=1rEVi(k), i=1,…,m.$

Table 4 shows the first and the second-order canonical loadings of six GCCA methods; in addition, Table 5 presents the explained variances by the kth order canonical variates and explained-variance indices for selected three GCCA methods. Figure 2 displays the 2-dimentional canonical loading plots. For example, from the results of SUMCOR, the first-order canonical variates of the first set can be interpreted as the overall mean of original variables, the second-order canonical variates of the first set can be interpreted as the contrast between x1 and others.

Figures Fig. 1. Quantification plots with canonical weights. $SUMCOR=maximize ∑i,jmϕij$; MAXECC = maximize (l1lm)/(l1 + lm); MINVAR = minimize lm. Fig. 2. Canonical loading plots. $SUMCOR=maximize ∑i,jmϕij$; MAXECC = maximize (l1lm)/(l1 + lm); MINVAR = minimize lm.
TABLES

### Table 1

Correlation matrix

Sets of variablesIIIIII

x1x2x3x4x5x6x7x8x9
Ix11.000.250.270.440.180.190.430.370.28
x20.251.000.400.140.650.260.190.530.36
x30.270.401.000.180.410.610.230.470.61

IIx40.440.140.181.000.090.150.850.250.19
x50.180.650.410.091.000.300.100.540.39
x60.190.260.610.150.301.000.180.440.50

IIIx70.430.190.230.850.100.181.000.290.25
x80.370.530.470.250.540.440.291.000.43
x90.280.360.610.190.390.500.250.431.00

### Table 2

The first and the second-order solutions of GCCA

a10.3181.0020.3191.0010.3230.9990.7450.7310.3910.967−0.0260.995
0.426−0.3600.425−0.3680.424−0.3880.221−0.6380.393−0.4860.728−0.416
0.590−0.3760.589−0.3700.588−0.3550.341−0.4850.565−0.3280.4640.131

a20.4140.9200.4170.9190.4220.9160.9480.3410.5360.8530.9900.178
0.558−0.3840.557−0.3850.554−0.3920.111−0.6960.494−0.468−0.024−0.675
0.497−0.2890.497−0.2900.495−0.2900.143−0.5260.460−0.3420.067−0.576

a30.2991.0080.3011.0090.3061.0090.9170.5240.4150.9710.9480.464
0.557−0.3250.556−0.3430.555−0.3640.153−0.7140.515−0.4570.105−0.693
0.462−0.3950.461−0.3800.459−0.3620.081−0.5050.413−0.3750.052−0.543

ϕ120.7120.4150.7120.4160.7110.4170.5030.5550.6910.4330.1880.005
ϕ130.7290.3080.7290.3070.7290.3060.5290.4160.7190.3050.312−0.082
ϕ230.7480.7780.7480.7790.7490.7780.8510.6750.7710.7570.8460.669

l12.4592.0332.4592.0332.4592.0322.2702.1042.4542.0241.9751.673
l30.2500.2120.2490.2120.2480.2120.1480.2980.2260.2290.1440.326

GCCA = generalized canonical correlation analysis; $SUMCOR=maximize ∑i,jmϕij$; MAXVAR = maximize l1; $SSQCOR=maximize ∑i,jmϕij2$; MAXECC = maximize (l1lm)/(l1 + lm); GENVAR = minimize det(Φ); MINVAR = minimize lm.

### Table 3

Average correlations of canonical variates and GAI

kSUMCORMAXECCMINVAR

$ϕ-(k)$GAI$ϕ-(k)$GAI$ϕ-(k)$GAI
10.73061.1%0.62850.1%0.44962.0%
20.50089.8%0.54988.4%0.19774.0%
30.299100%0.302100%0.291100%

GAI = goodness-of-approximation indices; $SUMCOR=maximize ∑i,jmϕij$; MAXECC = maximize (l1lm)/(l1 + lm); MINVAR = minimize lm.

### Table 4

λ10.5840.8110.5850.8100.5870.8060.8920.4410.6420.7560.2810.926
0.741−0.2600.741−0.2660.740−0.2800.543−0.6490.717−0.3760.907−0.115
0.846−0.2490.846−0.2470.845−0.2410.631−0.5430.827−0.2620.7480.233

λ20.5390.8420.5410.8400.5460.8370.9790.1990.6490.7600.9980.031
0.745−0.3880.743−0.3900.741−0.3960.239−0.8230.680−0.4940.085−0.832
0.727−0.2660.726−0.2680.725−0.2700.318−0.6840.688−0.3550.209−0.752

λ30.5760.8150.5780.8150.5820.8130.9820.1900.6680.7440.9920.127
0.842−0.2030.842−0.2130.841−0.2270.454−0.7790.813−0.3370.402−0.791
0.776−0.2830.776−0.2750.774−0.2670.376−0.6810.738−0.3290.334−0.724

$SUMCOR=maximize ∑i,jmϕij$; MAXVAR = maximize l1; $SSQCOR=maximize ∑i,jmϕij2$; MAXECC = maximize (l1lm)/(l1 + lm); GENVAR = minimize det(Φ); MINVAR = minimize lm.

### Table 5

Explained-variance indices

SetskSUMCORMAXECCMINVAR

EVEVIEVEVIEVEVI
I10.53553.5%0.49649.6%0.48748.7%
20.26279.8%0.30480.0%0.30879.6%
30.202100%0.200100%0.204100%

II10.45845.8%0.37237.2%0.34934.9%
20.31076.8%0.39576.7%0.41976.8%
30.232100%0.233100%0.232100%

III10.54854.8%0.43743.7%0.41941.9%
20.26281.0%0.36980.6%0.38980.8%
30.190100%0.194100%0.192100%

$SUMCOR=maximize ∑i,jmϕij$; MAXECC = maximize (l1lm)/(l1 + lm); MINVAR = minimize lm; EVI = explainedvariance indices.

References
1. Anderson TW (1984). An Introduction to Multivariate Statistical Analysis, John Wiley & Sons, New York.
2. Carroll JD (1968). Generalization of canonical correlation analysis to three or more sets of variables. Proceedings of American Psychology Association, 227-228.
3. Coppi R and Bolasco S (1989). Multiway Data Analysis, North-Holland, New York.
4. Gifi A (1990). Nonlinear Multivariate Analysis, Wiley, New York.
5. Horst P (1961a). Relations among m sets of measures. Psychometrika, 26, 129-149.
6. Horst P (1961b). Generalized canonical correlations and their applications to experimental data. Journal of Clinical Psychology (Monograph supplement), 14, 331-347.
7. Horst P (1965). Factor Analysis of Data Matrices, New York, Holt, Rinehart and Winston.
8. Kang H and Kim K (2006). Unifying stationary equations for generalized canonical correlation analysis. Journal of the Korean Statistical Society, 35, 143-156.
9. Kettenring JR (1971). Canonical analysis of several sets of variables. Biometrika, 58, 433-451.
10. Park MR and Huh MH (1996). Quantification plots for several sets of variables. Journal of the Korean Statistical Society, 25, 589-601.
11. Steel GRD (1951). Minimum generalized variance for a set of linear functions. Annals of Mathematical Statistics, 22, 456-460.
12. Ten Berge JMF (1988). Generalized approaches to the MAXBET problem and the MAXDIFF problem, with applications to canonical correlations. Psychometrika, 53, 487-494.
13. Van de Geer JP (1984). Linear relations among k sets of variables. Psychometrika, 49, 79-94.