Logistic regression does not make many of the key assumptions of linear regression and general linear models that are based on ordinary least squares algorithms – particularly regarding linearity, normality, homoscedasticity, and measurement level.
Firstly, it does not need a linear relationship between the dependent and independent variables. Logistic regression can handle all sorts of relationships, because applies a non-linear log transformation to the predicted odds ratio. Secondly, the independent variables do not need to be multivariate normal – although multivariate normality yields a more stable solution. Also the error terms (the residuals) do not need to be multivariate normally distributed. Thirdly, homoscedasticity is not needed. Logistic regression does not need variances can be heteroscedastic for each level of the independent variables. Lastly, logistic regression can handle ordinal and nominal data as independent variables. The independent variables do not need to be metric (interval or ratio scaled).
However some other assumptions still apply.
Logistic regression requires the dependent variable to be binary. Reducing a ordinal or even metric variable to dichotomous level looses a lot of information, which makes the logistic regression inferior compared to ordinal regression in these cases.
Secondly, since logistic regression assumes that P(Y=1) is the probability of the event occurring, it is necessary that the dependent variable is coded accordingly. That is for the factor level 1 the dependent variable should represent the desired outcome.
Thirdly, the logistic regression model should be fitted correctly. Neither over fitting nor under fitting should occur. That is only the meaningful variables should be included, but also all meaningful variables should be included. A good approach to ensure this is to use a stepwise method to estimate the logistic regression.
Fourthly, the error terms need to be independent. Logistic regression requires each observation to be independent. That is that the data-points should not be from any dependent samples design, e.g., before-after measurements, or matched pairings. Also the model should have little or no multicollinearity. That is that the independent variables should be independent from each other. However, there is the option to include interaction effects of categorical variables in the analysis and the model. If multicollinearity is present centering the variables might fix, i.e. deducting the mean of each variable. If this does not lower the multicollinearity a factor analysis with orthogonally rotated factors should be done before the logistic regression is estimated.
Fifthly, logistic regression assumes linearity of independent variables and log odds. Whilst logistic regression does not require the dependent and independent variables to be related linearly, it requires that the independent variables are linearly related to the log odds. Otherwise the logistic regression underestimates the strength of the relationship and rejects the relationship to easily, that is being not significant (not rejecting the null hypothesis) where it should be significant. A solution to this problem is the categorization of the independent variables. That is transforming metric variables to ordinal level and then including them in the logistic regression model. Another approach would be to use discriminant analysis, if the assumptions of homoscedasticity, multivariate normality, and no multicollinearity are met.
Lastly, logistic regression requires quite large sample sizes. Because maximum likelihood estimates are less powerful than ordinary least squares (e.g., simple linear regression, multiple linear regression); whilst OLS needs 5 cases per independent variable in the analysis, ML needs at least 10 cases per independent variable, some statisticians recommend at least 30 cases for each parameter to be estimated.


