Posted April 10, 2017

In some of our past blogs, we have talked about confidence intervals, and how to interpret the p value, but what is a p value? Today, we will talk through a brief history and get into what p values are, how to determine your cutoff for statistical significance, and when you might want to change your cutoff.

The p value gained popularity through the famous biologist and statistician, Ronald Fisher, who you may have heard of for his role in the development of many modern day statistical concepts, including the F distribution (for ANOVA) and maximum likelihood testing. His 1925 work titled “Statistical Methods for Research Workers” is credited for popularizing the concept of the p value. He made several arguments in this publication, the most influential of which was the use of .05 as the cutoff for statistical significance. This cutoff is called the alpha (α) and acts as a benchmark for statistical significance. The p value corresponds to the probability of obtaining a random sample with an effect or difference as extreme (or more extreme) as what was observed in the data, assuming that the null hypothesis being tested (i.e., no effect/difference) is true. So, a value of .05 corresponds with a 5% (or 1 in 20) chance of drawing a random sample with an effect that extreme if no real effect exists in the population.

For example, say you are running a t-test to compare two groups (A vs. B) on some outcome. Your null hypothesis would be that there is no difference between Group A and Group B on the outcome. You collect your random sample, run your t-test, and get a p value less than .05. This means that, if your null hypothesis is indeed correct and there is no difference between the groups, the result that you obtained is very rare. You would expect to obtain such a result fewer than 1 in 20 times if you collected samples over and over again. Therefore, Fisher’s rationale for using .05 as the cutoff was that if an effect this rare is shown in data, there is a reasonable chance that this is due to some real effect in the population, and not likely due to random chance.

While this is a good start, some schools of research value a higher degree of certainty. For example, studies that rely on effects being true generally compare the p value against an α of .01 or .001. These are far more significant, and correspond with an effect that would only be seen in 1 out of 100, or 1 out of 1000 random samples respectively. For doctoral research in the social sciences, the value of .05 is pretty common; it lets the doctoral candidate make claims about significance without having to be overly stringent. This is common when the findings are exploratory and there are no lives on the line. But for something like a surgery, you might want to be very confident that the operation you are about to perform is going to make a difference before you put the patient under.