What is the value in examining a scatterplot for a regression analysis?
Residual scatterplots provide a visual examination of the assumptions of normality, linearity and homoscedasticity between the predicted dependent variable scores and the errors of prediction. The primary benefit is that the assumptions can be viewed and analyzed with one glance; therefore, any violation can be determined quickly and easily. When an analysis meets the assumptions the chances for making Type I and Type II errors are reduced, which improves the accuracy of the research findings.
A residual scatterplot is a figure that shows one axis for predicted scores and one axis for errors of prediction. Initial visual examination can isolate any outliers, otherwise known as extreme scores, in the data-set. Tabachnick and Fidell (2007) explain the residuals (the difference between the obtained DV and the predicted DV scores) should be normally distributed around the predicted DV scores (normality); should be in a straight-line relationship with the predicted DV scores (linearity); and the variance of the residuals should be the same for all predicted scores (homoscedasticity). If these are true, the assumptions are met and the scatterplot takes the shape of a rectangular; scores will be concentrated in the center and distributed in a rectangular pattern. More simply, scores will be randomly scattered about a horizontal line. In contrast, any systematic pattern or clustering of scores is considered a violation.
With regard to normality, when violated the scatterplot tends to be skewed to one end or another; when met, the scatterplot tends to be piled in the center but trails off symmetrically. When linearity is violated, the scatterplot shape tends to be curved (C- shaped or U-shaped); when met the scatterplot is rectangular. When homoscedasticity is violated the scatterplot fans-out toward one end. The spread may be several times higher for values at one end of the scatterplot compared to those at the other end. Violations of normality, linearity and homoscedasticity weaken the analysis.
The figure below shows a random displacement of scores that take on a rectangular shape with no clustering or systematic pattern. The figure shows the assumptions of normality, linearity and homoscedasticity are met.

In summary, residual scatterplots are a useful tool for checking the assumptions in a regression. Scatterplots can be viewed during an initial screening run of the analysis or after the analysis. The benefit of looking at scatterplot residuals in the early stages of an analysis is that it may save a researcher time. If the assumptions are not met, further screening must be conducted before the analysis can be completed and data may require cleansing and transformation. In this case, the researcher is not running analysis haphazardly. Provided the assumptions are met, the regression is ready to be run and the researcher has increased confidence that the chances of making a Type I or Type II error are reduced, ultimately improving the accuracy of any research results.
*For assistance with regression or other quantitative analyses click here.
Reference
Tabachnick, B. G. & Fidell, L. S. (2007). Using multivariate statistics (5th ed.). Boston, MA: Allyn and Bacon. View
Related Pages:



