Canonical Correlation Analysis (CCA)

Canonical correlation analysis (CCA) finds the maximally correlated lower dimensional representations of two data sets mathbf{X}_1 and mathbf{X}_2. CCA can be applied to any multidimensional data set to investigate co-variation.  In my field, it is applied to brain signals in order to determine relationship between two regions of interests. There are plenty of sources out there explaining the theory of CCA but I thought it would be a good idea to explain it using my own words and notation in my blog. I will try to be consistent with the notation in and try to fill the gaps in derivations. In CCA, we want to maximize the correlation between linear transformation of mathbf{X}_1 in mathbb{R}^{n times p} and mathbf{X}_2 in mathbb{R}^{n times q}: max_{mathbf{W}_1,mathbf{W}_2} corr left( mathbf{X}_1mathbf{W}_1,mathbf{X}_2mathbf{W}_2right) where mathbf{W}_1 in mathbb{R}^{p times d} and mathbf{W}_2 in mathbb{R}^{q times d} are canonical coefficient matrices and d leq minleft(rank(mathbf{X}_1),rank(mathbf{X}_2)right). Let’s analyze the method starting with the first columns of the canonical coefficient matrices: max_{mathbf{w}_1^{(1)},mathbf{w}_2^{(1)}} corr left( mathbf{X}_1mathbf{w}_1^{(1)},mathbf{X}_2mathbf{w}_2^{(1)}right) = max_{mathbf{w}_1^{(1)},mathbf{w}_2^{(1)}} frac{left(mathbf{w}_1^{(1)}right)^T mathbf{C}_{12} mathbf{w}_2^{(1)}}{sqrt{left(mathbf{w}_1^{(1)}right)^T mathbf{C}_{11} mathbf{w}_1^{(1)}left(mathbf{w}_2^{(1)}right)^T mathbf{C}_{22} mathbf{w}_2^{(1)}}} where mathbf{C}_{ij} = mbox{E}left[mathbf{X}_i^T mathbf{X}_jright]. This is equivalent to solving max_{mathbf{w}_1^{(1)},mathbf{w}_2^{(1)}}left(mathbf{w}_1^{(1)}right)^T mathbf{C}_{12} mathbf{w}_2^{(1)} with constraints left(mathbf{w}_1^{(1)}right)^T mathbf{C}_{11} mathbf{w}_1^{(1)}=left(mathbf{w}_2^{(1)}right)^T mathbf{C}_{22} mathbf{w}_2^{(1)}=1. Let mathbf{v}_1^{(1)} triangleq mathbf{C}_{11}^{1/2} mathbf{w}_1^{(1)} and mathbf{v}_2^{(1)}triangleqmathbf{C}_{22}^{1/2}mathbf{w}_2^{(1)} to simplify the constraint. The optimization problem now turns out to be: max_{mathbf{v}_1^{(1)},mathbf{v}_2^{(1)}} left(mathbf{v}_1^{(1)}right)^Tmathbf{G}mathbf{v}_2^{(1)} such that left(mathbf{v}_1^{(1)}right)^Tmathbf{v}_1^{(1)}=left(mathbf{v}_2^{(1)}right)^Tmathbf{v}_2^{(1)}=1 where mathbf{G} triangleq mathbf{C}_{11}^{-T/2}mathbf{C}_{12}mathbf{C}_{22}^{-1/2}. By the method of Lagrangian multipliers, the maximum is attained at a stationary point of mathcal{L}left(mathbf{v}_1^{(1)},mathbf{v}_2^{(1)},lambda,muright)=left(mathbf{v}_1^{(1)}right)^Tmathbf{G}mathbf{v}_2^{(1)}-lambdaleft(left(mathbf{v}_1^{(1)}right)^Tmathbf{v}_1^{(1)}-1right)-muleft(left(mathbf{v}_2^{(1)}right)^Tmathbf{v}_2^{(1)}-1right) for constants lambda and mu. At such a stationary point, we have: frac{partial mathcal{L}left(mathbf{v}_1^{(1)},mathbf{v}_2^{(1)},lambda,muright)}{mathbf{v}_1^{(1)}} = mathbf{G}mathbf{v}_2^{(1)}-2 lambda mathbf{v}_1^{(1)}=0 frac{partial mathcal{L}left(mathbf{v}_1^{(1)},mathbf{v}_2^{(1)},lambda,muright)}{mathbf{v}_2^{(1)}} = mathbf{G}^Tmathbf{v}_1^{(1)}-2 mumathbf{v}_2^{(1)}=0. From these two equations, we can conclude that lambda = mu and mathbf{G}^Tmathbf{G}mathbf{v}_2^{(1)} = 4lambda^2mathbf{v}_2^{(1)} mathbf{G}mathbf{G}^Tmathbf{v}_1^{(1)} = 4lambda^2mathbf{v}_1^{(1)}. Hence the vectors mathbf{v}_1^{(1)} and mathbf{v}_2^{(1)} are the eigenvectors of mathbf{G}mathbf{G}^T and mathbf{G}^Tmathbf{G} with the largest eigenvalue, respectively. One can also consider them as left and right singular vectors  of the matrix mathbf{G} with the largest singular value.  The first canonical vectors are then defined as mathbf{a}_1^{(1)} triangleq mathbf{X}_1 mathbf{w}_1^{(1)} and mathbf{a}_2^{(1)} triangleq mathbf{X}_2 mathbf{w}_2^{(1)}. We want the second canonical vectors to be orthogonal to the first ones which implies orthogonality between mathbf{v}_1^{(1)} and mathbf{v}_1^{(2)}; and between mathbf{v}_2^{(1)} and mathbf{v}_2^{(2)}. The new optimization problem for the second canonical vectors is defined as max_{mathbf{v}_1^{(2)},mathbf{v}_2^{(2)}} left(mathbf{v}_1^{(2)}right)^Tmathbf{G}mathbf{v}_2^{(2)} such that left(mathbf{v}_1^{(1)}right)^Tmathbf{v}_1^{(1)}=left(mathbf{v}_2^{(1)}right)^Tmathbf{v}_2^{(1)}=1 and left(mathbf{v}_1^{(1)}right)^Tmathbf{v}_1^{(2)}=left(mathbf{v}_2^{(1)}right)^Tmathbf{v}_2^{(2)}=0. We can use singular value decomposition of the matrix mathbf{G} to solve this problem: mathbf{G} = mathbf{U}mathbf{S}mathbf{V}^T where mathbf{U} and mathbf{V} are orthogonal matrices and mathbf{S} is a diagonal matrix of singular values in decreasing order. Let’s define the transformations: mathbf{z}_1 triangleq mathbf{U}^T mathbf{v}_1^{(2)} and mathbf{z}_2 triangleq mathbf{V}^T mathbf{v}_2^{(2)} so that the optimization function can be written as max_{mathbf{z}_1,mathbf{z}_2} mathbf{z}_1 ^Tmathbf{S}mathbf{z}_2 such that mathbf{z}_1^Tmathbf{z}_1=mathbf{z}_2^Tmathbf{z}_2=1 and z_{11}= z_{21} = 0 yielding the value of the second largest singular value of the matrix mathbf{G} (Eckart-Young theorem). Therefore the optimum vectors mathbf{v}_1^{(2)} and mathbf{v}_2^{(2)} are the right and left singular vectors of mathbf{G} with the second largest singular value. I hope this derivation clarifies the concept of CCA. Please contact me if you have any questions or comments.
This entry was posted in Research and tagged , , . Bookmark the permalink.