Canonical Correlation
Canonical correlation is a statistical technique that is used to example the degree of the relationship between two canonical (Latent) variables. While in regression analysis, we can find the relationship between one dependent variable and many independent variables, canonical correlation, examples the relationship between many dependent variables and many independent variables. In canonical correlation, we make one variate from the many independent variables and one variate from the many dependent variables. Then, we compare those variates to find the degree of relationship between all variables. Wilks's lambda is used to test the significance of canonical correlation. Like simple correlation, canonical correlation coefficient square gives the percentages of variance that can be explained in the dependent variable by using the independent variable.
Key concepts and terms:
Assumptions:
Data: Interval data is preferred for canonical correlation, and there should be no missing values in the data.
Linearity: Linear relationship is assumed between the dependent and independent variates.
Multicollinearity: There should be no perfect multicollinearity between variables. Low multicollinearity may be considered, but perfect multicollinearity may cause problems with the results.
Homoscedasticity and correlation: Homogeneity and correlation assumptions are assumed in canonical correlation.
No Outlier: Canonical Correlation is sensitive to outliers. Outliers are scores that fall outside of three (3) standard deviations.
Sample Size: Stevens (1986) recommends that at least 20 cases per variable should be in a sample for the first canonical correlation. If we want to interpret two canonical correlations, then 40 to 60 cases per variable should be in a sample size.
![]()




