24 Pages Posted: 12 Feb 2009
Date Written: January 23, 2009
The classical canonical correlation analysis is extremely greedy to maximize the squared correlation between two sets of variables. As a result, if one of the variables in the dataset-1 is very highly correlated with another variable in the dataset-2, the canonical correlation will be very high irrespective of the correlation among the rest of the variables in the two datasets. We intend here to propose an alternative measure of association between two sets of variables that will not permit the greed of a select few variables in the datasets to prevail upon the fellow variables so much as to deprive the latter of contributing to their representative variables or canonical variates.
Our proposed Representation-Constrained Canonical correlation (RCCCA) Analysis has the Classical Canonical Correlation Analysis (CCCA) at its one end (lambda=0) and the Classical Principal Component Analysis (CPCA) at the other (as lambda tends to be very large). In between it gives us a compromise solution. By a proper choice of lambda, one can avoid hijacking of the representation issue of two datasets by a lone couple of highly correlated variables across those datasets. This advantage of the RCCCA over the CCCA deserves a serious attention by the researchers using statistical tools for data analysis.
Keywords: Representation, constrained, canonical, correlation, principal components, variates, global optimization, particle swarm, ordinal variables, computer program, FORTRAN
JEL Classification: C13, C43, C45, C61, C63, C87
Suggested Citation: Suggested Citation
Mishra, Sudhanshu K., Representation-Constrained Canonical Correlation Analysis: A Hybridization of Canonical Correlation and Principal Component Analyses (January 23, 2009). Available at SSRN: https://ssrn.com/abstract=1331886 or http://dx.doi.org/10.2139/ssrn.1331886