In some of our past blogs, we have talked about confidence intervals, and how to interpret the p value, but what is a p value? Today, we will talk through a brief history and get into what p values are, how to determine your cutoff for statistical significance, and when you might want to change your cutoff.
The p value gained popularity through the famous biologist and statistician, Ronald Fisher, who you may have heard of for his role in the development of many modern day statistical concepts, including the F distribution (for ANOVA) and maximum likelihood testing. Ronald Fisher popularized the concept of the p-value in his 1925 work titled “Statistical Methods for Research Workers.
In this publication, the most influential of which was the use of .05 as the cutoff for statistical significance. Ronald Fisher popularized the concept of the p-value in his 1925 work, Statistical Methods for Research Workers. So, a value of .05 corresponds with a 5% (or 1 in 20) chance of drawing a random sample with an effect that extreme if no real effect exists in the population.
For example, say you are running a t-test to compare two groups (A vs. B) on some outcome. Your null hypothesis would be that there is no difference between Group A and Group B on the outcome. You collect your random sample, run your t-test, and get a p value less than .05. This means that, if your null hypothesis is indeed correct and there is no difference between the groups, the result that you obtained is very rare. You would expect to obtain such a result fewer than 1 in 20 times if you collected samples over and over again.
Therefore, Fisher’s rationale for using .05 as the cutoff was that if an effect this rare is shown in data, there is a reasonable chance that this is due to some real effect in the population, and not likely due to random chance.
While this is a good start, some schools of research value a higher degree of certainty. For example, studies that rely on effects being true generally compare the p value against an α of .01 or .001. These are far more significant, and correspond with an effect that would only be seen in 1 out of 100, or 1 out of 1000 random samples respectively. For doctoral research in the social sciences, the value of .05 is pretty common; it lets the doctoral candidate make claims about significance without having to be overly stringent. This is common when the findings are exploratory and there are no lives on the line. But for something like a surgery, you might want to be very confident that the operation you are about to perform is going to make a difference before you put the patient under.