US 877.437.8622    UK 0.808.101.0930    info@statisticssolutions.com

Our Mission

"To serve graduate students and researchers by producing and delivering expert data analysis and clear sample size justification, comprehensible results, and ongoing support with unsurpassed response time and the most aggressive pricing in the statistical consulting field."

"Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse ultricies scelerisque bibendum. Maecenas sodales fermentum nisl id dapibus. Praesent malesuada, lacus non accumsan imperdiet, quam ante euismod dui, quis fermentum felis metus non nisi"

Assumptions of logistic regression

Logistic regression does not make many of the key assumptions of linear regression and general linear models that are based on ordinary least squares algorithms – particularly regarding linearity, normality, homoscedasticity, and measurement level.

Firstly, it does not need a linear relationship between the dependent and independent variables. Logistic regression can handle all sorts of relationships, because applies a non-linear log transformation to the predicted odds ratio. Secondly, the independent variables do not need to be multivariate normal – although multivariate normality yields a more stable solution. Also the error terms (the residuals) do not need to be multivariate normally distributed. Thirdly, homoscedasticity is not needed. Logistic regression does not need variances can be heteroscedastic for each level of the independent variables. Lastly, logistic regression can handle ordinal and nominal data as independent variables. The independent variables do not need to be metric (interval or ratio scaled).

However some other assumptions still apply.

Logistic regression requires the dependent variable to be binary. Reducing a ordinal or even metric variable to dichotomous level looses a lot of information, which makes the logistic regression inferior compared to ordinal regression in these cases.

Secondly, since logistic regression assumes that P(Y=1) is the probability of the event occurring, it is necessary that the dependent variable is coded accordingly. That is for the factor level 1 the dependent variable should represent the desired outcome.

Thirdly, the logistic regression model should be fitted correctly. Neither over fitting nor under fitting should occur. That is only the meaningful variables should be included, but also all meaningful variables should be included. A good approach to ensure this is to use a stepwise method to estimate the logistic regression.

Fourthly, the error terms need to be independent. Logistic regression requires each observation to be independent. That is that the data-points should not be from any dependent samples design, e.g., before-after measurements, or matched pairings. Also the model should have little or no multicollinearity. That is that the independent variables should be independent from each other. However, there is the option to include interaction effects of categorical variables in the analysis and the model. If multicollinearity is present centering the variables might fix, i.e. deducting the mean of each variable. If this does not lower the multicollinearity a factor analysis with orthogonally rotated factors should be done before the logistic regression is estimated.

Fifthly, logistic regression assumes linearity of independent variables and log odds. Whilst logistic regression does not require the dependent and independent variables to be related linearly, it requires that the independent variables are linearly related to the log odds. Otherwise the logistic regression underestimates the strength of the relationship and rejects the relationship to easily, that is being not significant (not rejecting the null hypothesis) where it should be significant. A solution to this problem is the categorization of the independent variables. That is transforming metric variables to ordinal level and then including them in the logistic regression model. Another approach would be to use discriminant analysis, if the assumptions of homoscedasticity, multivariate normality, and no multicollinearity are met.

Lastly, logistic regression requires quite large sample sizes. Because maximum likelihood estimates are less powerful than ordinary least squares (e.g., simple linear regression, multiple linear regression); whilst OLS needs 5 cases per independent variable in the analysis, ML needs at least 10 cases per independent variable, some statisticians recommend at least 30 cases for each parameter to be estimated.

Contact Request Form

Fill-out the form below to learn how we can assist you with Assumptions of logistic regression

We respect your privacy and guarantee that information will never be shared with third parties

  • Ph.D. Research Methodologists
  • Ph.D. Statisticians
  • Timely ongoing support
  • Accurate Statistics Guaranteed
  • Will Accommodate Your Schedule
  • Statistics Coaching
  • Quantitative & Qualitative Expertise
  • Customized Video Tutorials
Email Newsletter icon, E-mail Newsletter icon, Email List icon, E-mail List icon Sign Up For Our Weekly Email Newsletter
For Email Newsletters you can trust
WebsiteFeedback
Feedback Analytics