Posted February 21, 2017
The binary logistic regression may not be the most common form of regression, but when it is used, it tends to cause a lot more of a headache than necessary. Binary logistic regressions are very similar to their linear counterparts in terms of use and interpretation, and the only real difference here is in the type of dependent variable they use. In a linear regression, the dependent variable (or what you are trying to predict) is continuous. In a binary logistic regression, the dependent variable is binary, meaning that the variable can only have two possible values. Because of this, when interpreting the binary logistic regression, we are no longer talking about how our independent variables predict a score, but how they predict which of the two groups of the binary dependent variable people end up falling into. To do this, we look at the odds ratio.
Consider a binary logistic regression conducted by a researcher who recently watched the movie Jaws and is terrified of facing the same fate as some of the less fortunate characters in that movie. She chooses a few predictor variables to assess her chances of being eaten by a giant man-eating great white shark, based on (a) score on the Shark Related Deliciousness Scale (SRDS), and (b) her gender. Because she has to define these variables so she can interpret them later, she identifies the SRDS scale as ranging anywhere from 1 to 5; this is continuous. Gender is binary, just like the outcome, and she recodes it as 0 = female and 1 = male.
She begins by gathering some data on all of those who had been eaten by a giant man-eating great white shark in the past. After collecting the data and running the analysis on these variables to determine their relation with meeting an untimely demise to this huge sea creature, she finds that the regression itself is significant. The analysis produces the output in the table below. and calculates the following outcomes. Typically, a binary logistic regression analysis would give you more output than this, but today we will focus on the odds ratio.
Predictor |
p value |
Odds ratio |
Gender |
.030 |
5.00 |
SDRS score |
.001 |
2.00 |
As we covered above, one of these predictors is binary and the other is continuous. This means we have to interpret the two a little bit differently. First is the binary score: gender. We first look at the p value. It is below .05, telling us that it is significant, and we can safely interpret the odds ratio. To interpret this result, we have to know what a 0 (low) and a 1 (high) correspond to, and our researcher recalls that she coded this as 0 = female, and 1 = male. She finds this to be a good thing because when the odds ratio is greater than 1, it describes a positive relationship. The positive relationship means that as gender “increases,” the odds of being eaten increases. Based on our coding, an “increase” in gender means a gender of 1 instead of 0 – in other words, being male. This can be interpreted to mean that being in the (1) group, or being male, puts you at 5 times greater odds of being eaten.
If the odds ratio for gender had been below 1, she would have been in trouble, as an odds ratio less than 1 implies a negative relationship. This means that being male would correspond with lower odds of being eaten. To put this in perspective, if she had coded male as 0 and female as 1, the same odds ratio would have been inverted to 0.2, or (1/5). This still means that females were at lesser odds of being eaten, as the odds ratio would have been less than 1.
Next is the result for our fictitious deliciousness scale. It has a p value of .001, which is lower than the standard .05 cutoff, so this variable is significant. Because this variable is continuous, the interpretation of the odds ratio is a little different, but we can use the same logic. This odds ratio is interpreted in terms of each unit increase on the scale (i.e., going from 1 to 2, 2 to 3, etc.). Thus, for each increase in deliciousness score, the odds of being eaten by a Jaws-like monstrosity increase by a factor of 2. This means that someone with a score of 2 on the scale is 2 times more likely to be eaten than someone with a score of 1. Likewise, the odds of someone with a score of 1 are inverted from there (1/2), or .5, to describe how much less likely they are to be eaten than someone with a score of 2. All of these are in relation to someone with an adjacent score (i.e., 1 vs. 2, 2 vs. 3, and so on). But to compare someone with a score of 2 to someone with a 5, things start to add up...
At a deliciousness of 2, the odds are 2 times more likely than 1; at 3, the odds are 4 times more likely than 1 (since they are 2 times more likely than a deliciousness of 2, which is 2 times more likely than a score of 1). Following this logic, skipping ahead more than one point at a time, you use the following equation: (Odds Ratio*number of intervals difference) = difference in odds. So, for someone with a score of 5 (4 intervals from a score of 1), their odds of being eaten are (2*4) 8 times greater than someone with a score of 1.
To conclude, the important thing to remember about the odds ratio is that an odds ratio greater than 1 is a positive association (i.e., higher number for the predictor means group 1 in the outcome), and an odds ratio less than 1 is negative association (i.e., higher number for the predictor means group 0 in the outcome).