Home Blog The differences in most common statistical analyses

The differences in most common statistical analyses

Statistical Analysis

Correlation vs. Regression vs. Mean Differences

Inferential (parametric and non-parametric) statistics are conducted when the goal of the research is to draw conclusions about the statistical significance of the relationships and/or differences among variables of interest.

The “relationships” can be tested in different statistically ways, depending on the goal of the research. The three most common meanings of “relationship” between/among variables are:

1. Strength, or association, between variables = e.g., Pearson & Spearman rho correlations

2. Statistical differences on a continuous variable by group(s) = e.g., t-test and ANOVA

3. Statistical contribution/prediction on a variable from another(s) = regression.

Correlations are the appropriate analyses when the goal of the research is to test the strength, or association, between two variables. There are two main types of correlations: Pearson product-moment correlations, a.k.a. Pearson (r), and Spearman rho (r_s) correlations. A Pearson correlation is a parametric test that is appropriate when the two variables are continuous. Like with all parametric tests, there are assumptions that need to be met; for a Pearson correlation: linearity and homoscedasticity. A Spearman correlation is a non-parametric test that is appropriate when at least one of the variables is ordinal.

o E.g., a Pearson correlation is appropriate for the two continuous variables: age and height.

o E.g., a Spearman correlation is appropriate for the variables: age (continuous) and income level (under 25,000, 25,000 – 50,000, 50,001 – 100,000, above 100,000).

To test for mean differences by group, there a variety of analyses that can be appropriate. Three parametric examples will be given: Dependent sample t test, Independent sample t test, and an analysis of variance (ANOVA). The assumption of the dependent sample t test is normality. The assumptions of the independent sample t test are normality and equality of variance (a.k.a. homogeneity of variance). The assumptions of an ANOVA are normality and equality of variance (a.k.a. homogeneity of variance).

o E.g., a dependent t – test is appropriate for testing mean differences on a continuous variable by time on the same group of people: testing weight differences by time (year 1 – before diet vs. year 2 – after diet) for the same participants.

o E.g., an independent t-test is appropriate for testing mean differences on a continuous variable by two independent groups: testing GPA scores by gender (males vs. females)

o E.g., an ANOVA is appropriate for testing mean differences on a continuous variable by a group with more than two independent groups: testing IQ scores by college major (Business vs. Engineering vs. Nursing vs. Communications)

To test if a variable(s) offers a significant contribution, or predicts, another variable, a regression is appropriate. Three parametric examples will be given: simple linear regression, multiple linear regression, and binary logistic regression. The assumptions of a simple linear regression are linearity and homoscedasticity. The assumptions of a multiple linear regressions are linearity, homoscedasticity, and the absence of multicollinearity. The assumption of binary logistic regression is absence of multicollinearity.

o E.g., a simple linear regression is appropriate for testing if a continuous variable predicts another continuous variable: testing if IQ scores predict SAT scores

o E.g., a multiple linear regression is appropriate for testing if more than one continuous variable predicts another continuous variable: testing if IQ scores and GPA scores predict SAT scores

o E.g., a binary logistic regression is appropriate for testing if more than one variable (continuous or dichotomous) predicts a dichotomous variable: testing if IQ scores, gender, and GPA scores predict entrance to college (yes = 1 vs. no = 0).

In regards to the assumptions mentioned above:

o Linearity assumes a straight line relationship between the variables

o Homoscedasticity assumes that scores are normally distributed about the regression line

o Absence of multicollinearity assumes that predictor variables are not too related

o Normality assumes that the dependent variables are normally distributed (symmetrical bell shaped) for each group

o Homogeneity of variance assumes that groups have equal error variances

Get Your Dissertation Approved

We work with graduate students every day and know what it takes to get your research approved.

Address committee feedback
Roadmap to completion
Understand your needs and timeframe