Posted December 7, 2012

__Correlation vs. Regression vs. Mean Differences__

- Inferential (parametric and non-parametric) statistics are conducted when the goal of the research is to draw conclusions about the statistical significance of the relationships and/or differences among variables of interest.

- The “relationships” can be tested in different statistically ways, depending on the goal of the research. The three most common meanings of “relationship” between/among variables are:

1. Strength, or association, between variables = e.g., Pearson & Spearman rho correlations

3. Statistical contribution/prediction on a variable from another(s) = regression.

- Correlations are the appropriate analyses when the goal of the research is to test the strength, or association, between two variables. There are two main types of correlations: Pearson product-moment correlations, a.k.a. Pearson (
*r*), and Spearman rho (*r*) correlations. A Pearson correlation is a parametric test that is appropriate when the two variables are continuous. Like with all parametric tests, there are assumptions that need to be met; for a Pearson correlation: linearity and homoscedasticity. A Spearman correlation is a non-parametric test that is appropriate when at least one of the variables is ordinal._{s}

o E.g., a Pearson correlation is appropriate for the two continuous variables: age and height.

o E.g., a Spearman correlation is appropriate for the variables: age (continuous) and income level (under 25,000, 25,000 – 50,000, 50,001 – 100,000, above 100,000).

- To test for mean differences by group, there a variety of analyses that can be appropriate. Three parametric examples will be given: Dependent sample
*t*test, Independent sample*t*test, and an analysis of variance (ANOVA). The assumption of the dependent sample*t*test is normality. The assumptions of the independent sample*t*test are normality and equality of variance (a.k.a. homogeneity of variance). The assumptions of an ANOVA are normality and equality of variance (a.k.a. homogeneity of variance).

o E.g., a dependent *t* – test is appropriate for testing mean differences on a continuous variable by time on the same group of people: testing weight differences by time (year 1 - before diet vs. year 2 – after diet) for the same participants.

o E.g., an independent *t*-test is appropriate for testing mean differences on a continuous variable by two independent groups: testing GPA scores by gender (males vs. females)

o E.g., an ANOVA is appropriate for testing mean differences on a continuous variable by a group with more than two independent groups: testing IQ scores by college major (Business vs. Engineering vs. Nursing vs. Communications)

- To test if a variable(s) offers a significant contribution, or predicts, another variable, a regression is appropriate. Three parametric examples will be given: simple linear regression, multiple linear regression, and binary logistic regression. The assumptions of a simple linear regression are linearity and homoscedasticity. The assumptions of a multiple linear regressions are linearity, homoscedasticity, and the absence of multicollinearity. The assumption of binary logistic regression is absence of multicollinearity.

o E.g., a simple linear regression is appropriate for testing if a continuous variable predicts another continuous variable: testing if IQ scores predict SAT scores

o E.g., a multiple linear regression is appropriate for testing if more than one continuous variable predicts another continuous variable: testing if IQ scores and GPA scores predict SAT scores

o E.g., a binary logistic regression is appropriate for testing if more than one variable (continuous or dichotomous) predicts a dichotomous variable: testing if IQ scores, gender, and GPA scores predict entrance to college (yes = 1 vs. no = 0).

- In regards to the assumptions mentioned above:

o Linearity assumes a straight line relationship between the variables

o Homoscedasticity assumes that scores are normally distributed about the regression line

o Absence of multicollinearity assumes that predictor variables are not too related

o Normality assumes that the dependent variables are normally distributed (symmetrical bell shaped) for each group

o Homogeneity of variance assumes that groups have equal error variances