What is Logistic Regression?

Logistic regression is the appropriate regression analysis to conduct when the dependent variable is dichotomous (binary).  Like all regression analyses, the logistic regression is a predictive analysis.  Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more metric (interval or ratio scale) independent variables.

Conduct Your Logistic Regression Now!

Fill out the form above, and start using Intellectus Statistics for FREE!

How does the probability of getting lung cancer change for every additional pound of overweight and for every X cigarettes smoked per day?

Do body weight calorie intake, fat intake, and age have an influence on heart attacks (yes vs. no)?

The major assumptions are:

  1. That the outcome must be discrete, otherwise explained as, the dependent variable should be dichotomous in nature (e.g., presence vs. absent);
  2. There should be no outliers in the data, which can be assessed by converting the continuous predictors to standardized, or z scores, and remove values below -3.29 or greater than 3.29.
  3. There should be no high intercorrelations (multicollinearity) among the predictors.  This can be assessed by a correlation matrix among the predictors. Tabachnick and Fidell (2012) suggest that as long correlation coefficients among independent variables are less than 0.90 the assumption is met.

Standard linear regression requires the dependent variable to be of metric (interval or ratio) scale.  How can we apply the same principle to a dichotomous (0/1) variable?  Logistic regression assumes that the dependent variable is a stochastic event.  That is that for instance if we analyze a pesticides kill rate the outcome event is either killed or alive.  Since even the most resistant bug can only be either of these two states, logistic regression thinks in likelihoods of the bug getting killed.  If the likelihood of killing the bug is > 0.5 it is assumed dead, if it is < 0.5 it is assumed alive.

The outcome variable – which must be coded as 0 and 1 – is placed in the first box labeled Dependent, while all predictors are entered into the Covariates box (categorical variables should be appropriately dummy coded). SPSS predicts the value labeled 1 by default, so careful attention should be paid to the coding of the outcome (usually it makes more sense to examine the presence of a characteristic or “success.”Sometimes instead of a logit model for logistic regression a probit model is used.  The following graph shows the difference for a logit and a probit model for different values (-4,4).  Both models are commonly used in logistic regression, in most cases a model is fitted with both functions and the function with the better fit is chosen.  However, probit assumes normal distribution of the probability of the event, when logit assumes the log distribution.  Thus the difference between logit and probit is typically seen in small samples.

At the center of the logistic regression analysis is the task estimating the log odds of an event.  Mathematically logistic regression estimates a multiple linear regression function defined as:


for i = 1…n .

When selecting the model for the logistic regression analysis another important consideration is the model fit.  Adding independent variables to a logistic regression model will always increase its statistical validity, because it will always explain a bit more variance of the log odds (typically expressed as R²).  However, adding more and more variables to the model makes it inefficient and over fitting occurs.

Nevertheless, many people want an equivalent way of describing how good a particular model is, and numerous pseudo-R2 values have been developed. These should be interpreted with extreme caution as they have many computational issues which cause them to be artificially high or low. A better approach is to present any of the goodness of fit tests available; Hosmer-Lemeshow is a commonly used measure of goodness of fit based on the Chi-square test (which makes sense given that logistic regression is related to crosstabulation).

Statistics Solutions can assist with your quantitative analysis by assisting you to develop your methodology and results chapters. The services that we offer include:

Data Analysis Plan

  • Edit your research questions and null/alternative hypotheses
  • Write your data analysis plan; specify specific statistics to address the research questions, the assumptions of the statistics, and justify why they are the appropriate statistics; provide references
  • Justify your sample size/power analysis, provide references
  • Explain your data analysis plan to you so you are comfortable and confident
  • Two hours of additional support with your statistician

Quantitative Results Section (Descriptive Statistics, Bivariate and Multivariate Analyses, Structural Equation Modeling, Path analysis, HLM, Cluster Analysis)

  • Clean and code dataset
  • Conduct descriptive statistics (i.e., mean, standard deviation, frequency and percent, as appropriate)
  • Conduct analyses to examine each of your research questions
  • Write-up results
  • Provide APA 6th edition tables and figures
  • Explain chapter 4 findings
  • Ongoing support for entire results chapter statistics

*Please call 877-437-8622 to request a quote based on the specifics of your research, or email Info@StatisticsSolutions.com.

Related Pages:

Conduct and Interpret a Logistic Regression

Assumptions of Logistic Regression