February 4, 2012

Correlation (Pearson, Kendall, Spearman)

Correlation is a bivariate analysis that measures the strengths of association between two variables.  In statistics, the value of the correlation coefficient varies between +1 and -1.  When the value of the correlation coefficient lies around ± 1, then it is said to be a perfect degree of association between the two variables.  As the correlation coefficient value goes towards 0, the relationship between the two variables will be weaker.  Usually, in statistics, we measure three types of correlation: Pearson correlation, Kendall rank correlation and Spearman correlation.

Pearson r correlation: Pearson r correlation is widely used in statistics to measure the degree of the relationship between linear related variables.  For the Pearson r correlation, both variables should be normally distributed.  For example, in the stock market, if we want to measure how two commodities are related to each other, Pearson r correlation is used to measure the degree of relationship between the two commodities.  The following formula is used to calculate the Pearson r correlation:

Where:
r = Pearson r correlation coefficient
N = number of value in each data set
∑xy = sum of the products of paired scores
∑x = sum of x scores
∑y = sum of y scores
∑x2= sum of squared x scores
∑y2= sum of squared y scores

Kendall rank correlation: Kendall rank correlation is a non-parametric test that does not assume any assumptions related to the distributions— like Pearson’s correlation.  The following formula is used to calculate the value of Kendall rank correlation:

Where:
Nc= number of concordant
Nd= Number of discordant

=Total number of possible pairing of observations

Spearman rank correlation: Spearman rank correlation is a non-parametric test that is used to measure the degree of association between two variables.  It was developed by Spearman, thus it is called the Spearman rank correlation.  Spearman rank correlation test does not assume any assumptions about the distribution.

The following formula is used to calculate the Spearman rank correlation:

Where:
P= Spearman rank correlation
di= the difference between the ranks of corresponding values Xi and Yi
n= number of value in each data set

Correlation Resources:

Algina, J., & Keselman, H. J. (1999). Comparing squared multiple correlation coefficients: Examination of a confidence interval and a test significance. Psychological Methods, 4(1), 76-83.

Bobko, P. (2001). Correlation and regression: Applications for industrial organizational psychology and management (2nd ed.). Thousand Oaks, CA: Sage Publications. View

Bonett, D. G. (2008). Meta-analytic interval estimation for bivariate correlations. Psychological Methods, 13(3), 173-181.

Chen, P. Y., & Popovich, P. M. (2002). Correlation: Parametric and nonparametric measures. Thousand Oaks, CA: Sage Publications. View

Cheung, M. W. -L., & Chan, W. (2004). Testing dependent correlation coefficients via structural equation modeling. Organizational Research Methods, 7(2), 206-223.

Coffman, D. L., Maydeu-Olivares, A., Arnau, J. (2008). Asymptotic distribution free interval estimation: For an intraclass correlation coefficient with applications to longitudinal data. Methodology, 4(1), 4-9.

Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple regression/correlation analysis for the behavioral sciences. (3rd ed.). Mahwah, NJ: Lawrence Erlbaum Associates. View

Hatch, J. P., Hearne, E. M., & Clark, G. M. (1982). A method of testing for serial correlation in univariate repeated-measures analysis of variance. Behavior Research Methods & Instrumentation, 14(5), 497-498.

Kendall, M. G., & Gibbons, J. D. (1990). Rank Correlation Methods (5th ed.). London: Edward Arnold. View

Krijnen, W. P. (2004). Positive loadings and factor correlations from positive covariance matrices. Psychometrika, 69(4), 655-660.

Shieh, G. (2006). Exact interval estimation, power calculation, and sample size determination in normal correlation analysis. Psychometrika, 71(3), 529-540.

Stauffer, J. M., & Mendoza, J. L. (2001). The proper sequence for correcting correlation coefficients for range restriction and unreliability. Psychometrika, 66(1), 63-68.

Related Pages: