When Significance isn’t Quite So Significant: Measuring Effect Size


Posted March 8, 2017

“The primary product of a research inquiry is one or more measures of effect size, not p values.” –Cohen (1990)

So, you have run your hypothesis test and received a significant result; your p value is < .05, or perhaps it is even < .001. Bam, your null hypothesis can be rejected, you can make your claims of a significant effect and move along your merry way. Not so fast! Your p value can only tell you so much—in fact, it only tells you if your result was likely just due to chance or not. What it does not tell you is much more important. Depending on the size of your effect, that significant difference or relationship might not be very substantive at all. Further complicating things is the fact that your p value is affected by sample size—get enough participants, and sure enough, an infinitesimally small effect could show significance.

Suppose your doctor tells you that recent research shows that those who eat Statistics-Ohs for breakfast score 10 points higher on a Cardiac Health scale than those who do not eat Statistics-Ohs as part of healthy balanced breakfast. You would not quite know how to interpret that if you are not familiar with this Cardiac Health scale. Suppose, instead, that the doctor told you that people who eat Statistics-Ohs for breakfast are ten times less likely to suffer a heart attack! That is a pretty noticeable effect and is more easily interpretable.

So, what are these measures of effect size? Cohen’s d, r, and R2 are common measures of effect size. Cohen’s d is used for calculating differences between two groups. R is used for correlational measures, such as Pearson’s correlations or regressions. Cohen (1988) gave guidelines for effect sizes of small (d = 0.2, r = .10 and below), medium (d = .05 r = .24), and large (d = 0.8, r = .37 and above). R2, often referred to as the coefficient of determination, represents the proportion of variance in the dependent variable that is accounted for by a regression model. So, an R2 of .67 would equate to 67% of explained variance. Another effect size you are likely to encounter is eta squared, represented as η2. This has the same interpretation as R2 (the proportion of variance that can be attributed to your model), except it is used for ANOVA models. Partial η2 is an adjusted version of this, accounting for the unexplained variance in other independent variables plus the variation explained by the independent variable of focus.

So, keep in mind that, as Cohen (1994) wrote:

“All psychologists know that statistically significant does not mean plain English significant, but if one reads the literature, one often discovers a finding reported in the Results section studded with asterisks implicitly becomes in the Discussion seauthor rachaelction highly significant, or very highly significant, important, big!”

In other words, your significant result might not be so significant after all. It is just as important to consider the effect size when you discuss results that are statistically significant.

References
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). St. Paul, MN: West Publishing Company.
Cohen, J. (1990). Things I have learned (so far). American Psychologist, 45, 1304-1312.
Cohen, J. (1994), "The earth is round (p<.05)," American Psychologist, 49(12), 997-1003.


Pin It on Pinterest

Shares
Share This