Home
Blog
The differences in most common statistical analyses
The differences in most common statistical analyses
Correlation vs. Regression vs. Mean Differences
- Researchers use inferential statistics (parametric and non-parametric) to determine the statistical significance of relationships or differences among variables.
- The “relationships” can be tested in different statistically ways, depending on the goal of the research. The three most common meanings of “relationship” between/among variables are:
- 1. Strength, or association, between variables = e.g., Pearson & Spearman rho correlations
- 2. Statistical differences on a continuous variable by group(s) = e.g., t-test and ANOVA
- 3. Statistical contribution/prediction on a variable from another(s) = regression
Correlations
- Correlations test the strength or association between two variables. There are two main types: Pearson (r) and Spearman rho (rs). Pearson is a parametric test for continuous variables. It requires linearity and homoscedasticity. Spearman is a non-parametric test for ordinal variables.
o E.g., a Pearson correlation is appropriate for the two continuous variables: age and height.
o E.g., a Spearman correlation is appropriate for the variables: age (continuous) and income level (under 25,000, 25,000 – 50,000, 50,001 – 100,000, above 100,000).
Testing mean differences
- To test mean differences by group, several analyses are appropriate. Three parametric examples are the dependent sample t-test, independent sample t-test, and ANOVA. The dependent sample t-test assumes normality. The independent sample t-test and ANOVA assume normality and homogeneity of variance.
o E.g., a dependent t – test is appropriate for testing mean differences on a continuous variable by time on the same group of people: testing weight differences by time (year 1 – before diet vs. year 2 – after diet) for the same participants.
o E.g., an independent t-test is appropriate for testing mean differences on a continuous variable by two independent groups: testing GPA scores by gender (males vs. females)
o E.g., an ANOVA is appropriate for testing mean differences on a continuous variable by a group with more than two independent groups: testing IQ scores by college major (Business vs. Engineering vs. Nursing vs. Communications)
Purpose of regression
- Regression tests if a variable significantly predicts another. Three parametric examples are simple linear, multiple linear, and binary logistic regression. Simple linear regression assumes linearity and homoscedasticity. Multiple linear regression also assumes no multicollinearity. Binary logistic regression assumes no multicollinearity.
o E.g., a simple linear regression is appropriate for testing if a continuous variable predicts another continuous variable: testing if IQ scores predict SAT scores
o E.g., a multiple linear regression is appropriate for testing if more than one continuous variable predicts another continuous variable: testing if IQ scores and GPA scores predict SAT scores
o E.g., a binary logistic regression is appropriate for testing if more than one variable (continuous or dichotomous) predicts a dichotomous variable: testing if IQ scores, gender, and GPA scores predict entrance to college (yes = 1 vs. no = 0).
- In regards to the assumptions mentioned above:
o Linearity assumes a straight line relationship between the variables
o Homoscedasticity assumes that scores are normally distributed about the regression line
o Absence of multicollinearity assumes that predictor variables are not too related
o Normality assumes that the dependent variables are normally distributed (symmetrical bell shaped) for each group
o Homogeneity of variance assumes that groups have equal error variances