What is Logistic Regression?

Logistic regression is the appropriate regression analysis to conduct when the dependent variable is dichotomous (binary).  Like all regression analyses, the logistic regression is a predictive analysis.  Logistic regression explains the relationship between a binary dependent variable and one or more independent variables.

Logistic regressions can be hard to interpret, but Intellectus Statistics simplifies the analysis and provides clear explanations.

Types of Questions Binary Logistic Regression Can Answer

How does lung cancer probability change with each pound of weight and pack of cigarettes smoked?

Do body weight, calorie intake, fat intake, and age affect the probability of having a heart attack?

Binary Logistic Regression Major Assumptions

  1. The dependent variable should be dichotomous in nature (e.g., presence vs. absent).
  2. You should assess the data for outliers by converting the continuous predictors to standardized scores and removing values below -3.29 or greater than 3.29.
  3. There should be no high correlations (multicollinearity) among the predictors.  You can assess this by creating a correlation matrix among the predictors. Tabachnick and Fidell (2013) suggest that the assumption holds as long as the correlation coefficients among independent variables are less than 0.90.

At the center of the logistic regression analysis is the task estimating the log odds of an event.  Mathematically, logistic regression estimates a multiple linear regression function defined as:

logit(p)

for i = 1…n .

Overfitting. When selecting the model for theanalysis, you should also consider the model fit. Adding independent variables to a logistic regression model will always increase the amount of variance explained in the log odds (typically expressed as R²).  However, adding more variables to the model can overfit it and reduce its generalizability beyond the data on which you fit the model.

Reporting the R2.  Numerous pseudo-R2 values have been developed for binary logistic regression.  These should be interpreted with extreme caution as they have many computational issues which cause them to be artificially high or low.  A better approach is to present any of the goodness of fit tests available; Hosmer-Lemeshow is a commonly used measure of goodness of fit based on the Chi-square test.

Conduct and Interpret a Logistic Regression

Assumptions of Logistic Regression

Take the Course: Binary Logistic Regression