Assumptions of Logistic Regression

The #1 Mistake in Logistic Regression Dissertations: Testing the Wrong Assumptions.

Here’s what happens more often than it should: a student runs logistic regression, then tests for normality and homoscedasticity- assumptions that belong to linear regression, not logistic. The committee reads it and immediately knows the student doesn’t understand the analysis. Logistic regression has its own assumptions: independence of observations, no multicollinearity, linearity of predictors with the log-odds, and adequate events per variable. These are different. And getting them wrong isn’t a minor error, it’s a fundamental misunderstanding of the method.

The tipping point in logistic regression is knowing what NOT to test. The student who skips the Shapiro-Wilk and instead reports a Box-Tidwell test for linearity of the logit; that’s the student whose committee nods instead of sending it back.

20 minutes with Dr. Lani. No obligation. No pressure.

Logistic regression does not make many of the key assumptions of linear regression and general linear models that are based on ordinary least squares algorithms – particularly regarding linearity, normality, homoscedasticity, and measurement level.

First, logistic regression does not require a linear relationship between the dependent and independent variables.  Second, the error terms (residuals) do not need to follow a normal distribution. Third, you do not require homoscedasticity. Finally, logistic regression does not require you to measure the dependent variable on an interval or ratio scale.

However, some other assumptions still apply.

First, binary logistic regression requires the dependent variable to be binary and ordinal logistic regression requires the dependent variable to be ordinal.

Second, logistic regression requires the observations to be independent of each other.  In other words, the observations should not come from repeated measurements or matched data.

Third, logistic regression requires there to be little or no multicollinearity among the independent variables.  Meaning, that the independent variables should not be too highly correlated with each other.

Fourth, logistic regression assumes linearity of independent variables and log odds of the dependent variable. Although this analysis does not require the dependent and independent variables to be related linearly, it requires that the independent variables are linearly related to the log odds of the dependent variable.

Finally, logistic regression typically requires a large sample size.  A general guideline is that you need at minimum of 10 cases with the least frequent outcome for each independent variable in your model. For example, if you have 5 independent variables and the expected probability of your least frequent outcome is .10, then you would need a minimum sample size of 500 (10*5 / .10).

Take the course: Binary Logistic Regression

Take the course: Ordinal Logistic Regression