# Canonical Correlation Analysis (CCA)

Canonical correlation analysis (CCA) finds the maximally correlated lower dimensional representations of two data sets $mathbf{X}_1$ and $mathbf{X}_2$. CCA can be applied to any multidimensional data set to investigate co-variation.  In my field, it is applied to brain signals in order to determine relationship between two regions of interests. There are plenty of sources out there explaining the theory of CCA but I thought it would be a good idea to explain it using my own words and notation in my blog. I will try to be consistent with the notation in http://www.csee.umbc.edu/~adali/pubs/IEEEpubs/SPM2010correa.pdf and try to fill the gaps in derivations. In CCA, we want to maximize the correlation between linear transformation of $mathbf{X}_1 in mathbb{R}^{n times p}$ and $mathbf{X}_2 in mathbb{R}^{n times q}$: $max_{mathbf{W}_1,mathbf{W}_2} corr left( mathbf{X}_1mathbf{W}_1,mathbf{X}_2mathbf{W}_2right)$ where $mathbf{W}_1 in mathbb{R}^{p times d}$ and $mathbf{W}_2 in mathbb{R}^{q times d}$ are canonical coefficient matrices and $d leq minleft(rank(mathbf{X}_1),rank(mathbf{X}_2)right)$. Let’s analyze the method starting with the first columns of the canonical coefficient matrices: $max_{mathbf{w}_1^{(1)},mathbf{w}_2^{(1)}} corr left( mathbf{X}_1mathbf{w}_1^{(1)},mathbf{X}_2mathbf{w}_2^{(1)}right) = max_{mathbf{w}_1^{(1)},mathbf{w}_2^{(1)}} frac{left(mathbf{w}_1^{(1)}right)^T mathbf{C}_{12} mathbf{w}_2^{(1)}}{sqrt{left(mathbf{w}_1^{(1)}right)^T mathbf{C}_{11} mathbf{w}_1^{(1)}left(mathbf{w}_2^{(1)}right)^T mathbf{C}_{22} mathbf{w}_2^{(1)}}}$ where $mathbf{C}_{ij} = mbox{E}left[mathbf{X}_i^T mathbf{X}_jright]$. This is equivalent to solving $max_{mathbf{w}_1^{(1)},mathbf{w}_2^{(1)}}left(mathbf{w}_1^{(1)}right)^T mathbf{C}_{12} mathbf{w}_2^{(1)}$ with constraints $left(mathbf{w}_1^{(1)}right)^T mathbf{C}_{11} mathbf{w}_1^{(1)}=left(mathbf{w}_2^{(1)}right)^T mathbf{C}_{22} mathbf{w}_2^{(1)}=1$. Let $mathbf{v}_1^{(1)} triangleq mathbf{C}_{11}^{1/2} mathbf{w}_1^{(1)}$ and $mathbf{v}_2^{(1)}triangleqmathbf{C}_{22}^{1/2}mathbf{w}_2^{(1)}$ to simplify the constraint. The optimization problem now turns out to be: $max_{mathbf{v}_1^{(1)},mathbf{v}_2^{(1)}} left(mathbf{v}_1^{(1)}right)^Tmathbf{G}mathbf{v}_2^{(1)}$ such that $left(mathbf{v}_1^{(1)}right)^Tmathbf{v}_1^{(1)}=left(mathbf{v}_2^{(1)}right)^Tmathbf{v}_2^{(1)}=1$ where $mathbf{G} triangleq mathbf{C}_{11}^{-T/2}mathbf{C}_{12}mathbf{C}_{22}^{-1/2}$. By the method of Lagrangian multipliers, the maximum is attained at a stationary point of $mathcal{L}left(mathbf{v}_1^{(1)},mathbf{v}_2^{(1)},lambda,muright)=left(mathbf{v}_1^{(1)}right)^Tmathbf{G}mathbf{v}_2^{(1)}-lambdaleft(left(mathbf{v}_1^{(1)}right)^Tmathbf{v}_1^{(1)}-1right)-muleft(left(mathbf{v}_2^{(1)}right)^Tmathbf{v}_2^{(1)}-1right)$ for constants $lambda$ and $mu$. At such a stationary point, we have: $frac{partial mathcal{L}left(mathbf{v}_1^{(1)},mathbf{v}_2^{(1)},lambda,muright)}{mathbf{v}_1^{(1)}} = mathbf{G}mathbf{v}_2^{(1)}-2 lambda mathbf{v}_1^{(1)}=0$ $frac{partial mathcal{L}left(mathbf{v}_1^{(1)},mathbf{v}_2^{(1)},lambda,muright)}{mathbf{v}_2^{(1)}} = mathbf{G}^Tmathbf{v}_1^{(1)}-2 mumathbf{v}_2^{(1)}=0.$ From these two equations, we can conclude that $lambda = mu$ and $mathbf{G}^Tmathbf{G}mathbf{v}_2^{(1)} = 4lambda^2mathbf{v}_2^{(1)}$ $mathbf{G}mathbf{G}^Tmathbf{v}_1^{(1)} = 4lambda^2mathbf{v}_1^{(1)}$. Hence the vectors $mathbf{v}_1^{(1)}$ and $mathbf{v}_2^{(1)}$ are the eigenvectors of $mathbf{G}mathbf{G}^T$ and $mathbf{G}^Tmathbf{G}$ with the largest eigenvalue, respectively. One can also consider them as left and right singular vectors  of the matrix $mathbf{G}$ with the largest singular value.  The first canonical vectors are then defined as $mathbf{a}_1^{(1)} triangleq mathbf{X}_1 mathbf{w}_1^{(1)}$ and $mathbf{a}_2^{(1)} triangleq mathbf{X}_2 mathbf{w}_2^{(1)}$. We want the second canonical vectors to be orthogonal to the first ones which implies orthogonality between $mathbf{v}_1^{(1)}$ and $mathbf{v}_1^{(2)}$; and between $mathbf{v}_2^{(1)}$ and $mathbf{v}_2^{(2)}$. The new optimization problem for the second canonical vectors is defined as $max_{mathbf{v}_1^{(2)},mathbf{v}_2^{(2)}} left(mathbf{v}_1^{(2)}right)^Tmathbf{G}mathbf{v}_2^{(2)}$ such that $left(mathbf{v}_1^{(1)}right)^Tmathbf{v}_1^{(1)}=left(mathbf{v}_2^{(1)}right)^Tmathbf{v}_2^{(1)}=1$ and $left(mathbf{v}_1^{(1)}right)^Tmathbf{v}_1^{(2)}=left(mathbf{v}_2^{(1)}right)^Tmathbf{v}_2^{(2)}=0$. We can use singular value decomposition of the matrix $mathbf{G}$ to solve this problem: $mathbf{G} = mathbf{U}mathbf{S}mathbf{V}^T$ where $mathbf{U}$ and $mathbf{V}$ are orthogonal matrices and $mathbf{S}$ is a diagonal matrix of singular values in decreasing order. Let’s define the transformations: $mathbf{z}_1 triangleq mathbf{U}^T mathbf{v}_1^{(2)}$ and $mathbf{z}_2 triangleq mathbf{V}^T mathbf{v}_2^{(2)}$ so that the optimization function can be written as $max_{mathbf{z}_1,mathbf{z}_2} mathbf{z}_1 ^Tmathbf{S}mathbf{z}_2$ such that $mathbf{z}_1^Tmathbf{z}_1=mathbf{z}_2^Tmathbf{z}_2=1$ and $z_{11}= z_{21} = 0$ yielding the value of the second largest singular value of the matrix $mathbf{G}$ (Eckart-Young theorem). Therefore the optimum vectors $mathbf{v}_1^{(2)}$ and $mathbf{v}_2^{(2)}$ are the right and left singular vectors of $mathbf{G}$ with the second largest singular value. I hope this derivation clarifies the concept of CCA. Please contact me if you have any questions or comments.
This entry was posted in Research and tagged , , . Bookmark the permalink.