# The differences in most common statistical analyses

Posted December 7, 2012

Correlation vs. Regression vs. Mean Differences

•  Inferential (parametric and non-parametric) statistics are conducted when the goal of the research is to draw conclusions about the statistical significance of the relationships and/or differences among variables of interest.
•   The “relationships” can be tested in different statistically ways, depending on the goal of the research.  The three most common meanings of “relationship” between/among variables are:
1.      Strength, or association, between variables = e.g., Pearson & Spearman rho correlations
2.      Statistical differences on a continuous variable by group(s) = e.g., t-test and ANOVA
3.      Statistical contribution/prediction on a variable from another(s) = regression.
•  Correlations are the appropriate analyses when the goal of the research is to test the strength, or association, between two variables.  There are two main types of correlations: Pearson product-moment correlations, a.k.a. Pearson (r), and Spearman rho (rs) correlations.  A Pearson correlation is a parametric test that is appropriate when the two variables are continuous.  Like with all parametric tests, there are assumptions that need to be met; for a Pearson correlation: linearity and homoscedasticity.  A Spearman correlation is a non-parametric test that is appropriate when at least one of the variables is ordinal.
o   E.g., a Pearson correlation is appropriate for the two continuous variables: age and height.
o   E.g., a Spearman correlation is appropriate for the variables: age (continuous) and income level (under 25,000, 25,000 – 50,000, 50,001 – 100,000, above 100,000).
• To test for mean differences by group, there a variety of analyses that can be appropriate.  Three parametric examples will be given: Dependent sample t test, Independent sample t test, and an analysis of variance (ANOVA).  The assumption of the dependent sample t test is normality.  The assumptions of the independent sample t test are normality and equality of variance (a.k.a. homogeneity of variance).  The assumptions of an ANOVA are normality and equality of variance (a.k.a. homogeneity of variance).
o   E.g., a dependent t – test is appropriate for testing mean differences on a continuous variable by time on the same group of people: testing weight differences by time (year 1 – before diet vs. year 2 – after diet) for the same participants.
o   E.g., an independent t-test is appropriate for testing mean differences on a continuous variable by two independent groups: testing GPA scores by gender (males vs. females)
o   E.g., an ANOVA is appropriate for testing mean differences on a continuous variable by a group with more than two independent groups: testing IQ scores by college major (Business vs. Engineering vs. Nursing vs. Communications)
•  To test if a variable(s) offers a significant contribution, or predicts, another variable, a regression is appropriate.  Three parametric examples will be given: simple linear regression, multiple linear regression, and binary logistic regression.  The assumptions of a simple linear regression are linearity and homoscedasticity.  The assumptions of a multiple linear regressions are linearity, homoscedasticity, and the absence of multicollinearity.  The assumption of binary logistic regression is absence of multicollinearity.
o   E.g., a simple linear regression is appropriate for testing if a continuous variable predicts another continuous variable: testing if IQ scores predict SAT scores
o   E.g., a multiple linear regression is appropriate for testing if more than one continuous variable predicts another continuous variable: testing if IQ scores and GPA scores predict SAT scores
o   E.g., a binary logistic regression is appropriate for testing if more than one variable (continuous or dichotomous) predicts a dichotomous variable: testing if IQ scores, gender, and GPA scores predict entrance to college (yes = 1 vs. no = 0).
•  In regards to the assumptions mentioned above:
o   Linearity assumes a straight line relationship between the variables
o   Homoscedasticity assumes that scores are normally distributed about the regression line
o   Absence of multicollinearity assumes that predictor variables are not too related
o   Normality assumes that the dependent variables are normally distributed (symmetrical bell shaped) for each group
o   Homogeneity of variance assumes that groups have equal error variances

Shares