# Logistic Regression

Quantitative Results
Statistical Analysis

Logistic regression is the extension of simple linear regression. Simple Linear regression is a statistical technique that is used to learn about the relationship between the dependent and independent variables. In Linear regression, dependent and independent variables are continuous in nature. For example, we could apply it to sale and marketing expenditure, where we want to predict sales based on marketing expenditure. Where the dependent variable(s) is dichotomous or binary in nature, we cannot use simple linear regression. Logistic regression is the statistical technique used to predict the relationship between predictors and predicted variables where the dependent variable is binary. Furthermore, where our dependent variable has two categories, we use binary logistic regression. If our dependent variable has more than two categories, it will be necessary to use multinomial logistic regression, whereas if our dependent variable is ordinal in nature, we use ordinal logistic regression.

In logistic regression, we assume one reference category with which we compare other variables for the probability of the occurrence of specific ‘events’ by fitting a logistic curve.

Like other regression techniques, logistic regression involves the use of two hypotheses:

1.A Null hypothesis: null hypothesis beta coefficient is equal to zero, and,

2.Alternative hypothesis: Alternative hypothesis assumes that beta coefficient is not equal to zero.

Logistic regression does not require that the relationship between the dependent variable and independent variable(s) be linear. Also, logistic regression does not require the error term to be normally distributed. Logistic regression assumes that the independent variables are interval scaled or binary in nature. However, logistic regression does not require the variance between the categorical variables. In logistic regression, normality is also not required. However, logistic regression does assume the absence of outliers.

There are some key differences in the methodologies and processes involved in simple vs. logistic regression. In the case of simple regression, ANOVA is used to evaluate the overall model fitness. Furthermore, R-square is used to evaluate the variance, as explained by the independent variable. Cox and Snell’s R2, Nagelkerke’s R2, McFadden’s R2, Pseudo-R2 are alternatives to the R-square in logistic regression. Furthermore, we use the t-test to assess the significance of individual variables where simple regression is concerned. However, in the case of logistic regression, we use the Wald statistic to assess the significance of the independent variables. Instead of simple beta, exponential beta is used in logistic regression as the independent coefficient. Exponential beta provides an odd ratio for the dependent variable based on the independent variables. This essentially is a probability of an event occurring vs. not occurring.

Rule of thumb (Peruzzi et al, 1996) recommends that to estimate the logistic regression function, a minimum of 10 cases per independent variable is required to achieve reliable and meaningful results. For instance, where 10 independent variables are concerned, a minimum sample size of 100 with at least 10 cases per variable (once you take missing values and outliers into account) are permissible. 