# Normality

The normality assumption is one of the most misunderstood in all of statistics.  In multiple regression, the assumption requiring a normal distribution applies only to the disturbance term, not to the independent variables as is often believed.  Perhaps the confusion about this assumption derives from difficulty understanding what this disturbance term refers to – simply put, it is the random error in the relationship between the independent variables and the dependent variable in a regression model.  Each case in the sample actually has a different random variable which encompasses all the “noise” that accounts for differences in the observed and predicted values produced by a regression equation, and it is the distribution of this disturbance term or noise for all cases in the sample that should be normally distributed.

There are few consequences associated with a violation of the normality assumption, as it does not contribute to bias or inefficiency in regression models.  It is only important for the calculation of p values for significance testing, but this is only a consideration when the sample size is very small.  When the sample size is sufficiently large (>200), the normality assumption is not needed at all as the Central Limit Theorem ensures that the distribution of disturbance term will approximate normality.

When dealing with very small samples, it is important to check for a possible violation of the normality assumption.  This can be accomplished through an inspection of the residuals from the regression model (some programs will perform this automatically while others require that you save the residuals as a new variable and examine them using summary statistics and histograms).  There are several statistics available to examine the normality of variables. including skewness and kurtosis, as well as numerous graphical depictions, such as the normal probability plot.  Unfortunately the statistics to assess it are unstable in small samples, so their results should be interpreted with caution.  When the distribution of the disturbance term is found to deviate from normality, the best solution is to use a more conservative p value (.01 rather than .05) for conducting significance tests and constructing confidence intervals.