R-Squared: Telling us what we know and what we do not know

Quantitative Results

Regressions are one of the more common analyses used in quantitative doctoral research, and for good reason. A regression can use a set of variables to come up with predictions regarding what a certain outcome might be. However, regression analyses are not a perfect way to predict things, and statistics such as the R-squared (R2) can give insight into the limitations of the prediction.

The R2, or coefficient of determination, is a way to understand just how well you are able to predict your outcome. For example, we could guess at an employee’s projected longevity with a company based on a few things we know directly. If we knew every little detail about the employee, we might get closer and closer to a perfectly accurate projection. The problem is that we have no way to know everything about this employee. While we do not necessarily know about the employee’s personal life, formative experiences, or private plans to apply to a competitor, we can at least gauge how limited our predictions may be based on the absence of this unknown information. This is where the R2 comes into play.

The R2 tells us the percentage of variance in the outcome that is explained by the predictor variables (i.e., the information we do know). A perfect R2 of 1.00 means that our predictor variables explain 100% of the variance in the outcome we are trying to predict. In other words, an R2 of 1.00 means that we can use the predictor variables to know precisely what the outcome’s value will be with no room for error. An example of an R2 of 1.00 might be predicting the number of candles on a perfectly adorned birthday cake using the celebrant’s age. Under perfect circumstances, we have all the information we need (age) to know how many candles will be on the cake. As you deviate from the perfectly-appointed-cake ideal, the prediction might not do so well using just age. In reality, other factors may come into play, such as cake space, tradition, and whether it makes sense to cram 116 candles on Kane Takana’s cake.

However, in real research studies, your R2 will (almost) never be perfect. In the real world, we might get an R2 of .20, meaning we are only accounting for 20% of the variance in our outcome. This means the other 80% is variance we cannot explain. In other words, the other 80% is variance due to the information that we do not know. In the cake example, the other 80% might then come down to tradition, cake space, or maybe the celebrant’s gluten intolerance. Therefore, the R2 gives us an idea of how much the regression can tell us about the outcome, but also how much is left to outside influence. Knowing how much you do not know may be just as important as knowing how much you do, because this can give you and future researchers the opportunity to look elsewhere for other information that might push your R2 closer to 1.00. That said, finding a perfect R2 in real-world data might be a red flag – similar to finding a holy grail item in a Goodwill; you might want to think twice before you celebrate.