Home Blog Checking for Influential Data Points in Regression Analyses

Checking for Influential Data Points in Regression Analyses

Quantitative Results

Statistical Analysis

Regressions, being an analysis that uses the F test to assess relationships, are quite robust. However, it is important to know if any of your data points might be overly influencing the regression. Overly influential points can shift a regression’s line of best fit either toward or away from a good explanative model, reducing validity. In previous blogs, we’ve discussed testing for outliers. Still, there are a couple of specific ways to check a data point’s influence on a regression analyses in SPSS that do not have to do with testing either univariate or multivariate outliers.

To check on influential points, three possible methods you can use are scatter plots, partial plots, and Cook’s distances. Simple scatterplots will display the values of each independent variable plotted against the dependent variable. Partial plots are similar to scatterplots between each independent variable in a regression and the dependent variable, but are slightly different from a simple scatterplot in that they control for other variables in your regression. We can compute Cook’s distances to show the degree of influence each participant has on the regression.

Simple scatterplots: To make these in SPSS, you can create a scatterplot for each pair of variables: Graphs -> Legacy Dialogs -> Scatter / Dot, and select “Simple Scatter.” This will open a window where you can select your independent and dependent variables, pair by pair.

Partial plots: To create a partial plot for each independent variable/dependent variable pair, put together your regression like you usually would, and then click on the “Plots…” button in the regression window. Towards the bottom right, select “Produce all partial plots” and click “OK” to run the regression. Once you have your plots, it does not take much statistical experience to see whether there are a few points that seem to be wandering away from the herd, but do not close your regression window yet!

Need help conducting your analysis? Leverage our 30+ years of experience and low-cost service to complete your results!

Schedule now using the calendar below.

Cook’s distance: Cook’s distance can also be calculated in the regression window once you have put together your regression. Simply click the “Save…” button, and select “Cook’s” – it will be under the “Distances” heading.” This saves a new Cook’s distance variable to your dataset. Any participant with a Cook’s distance value over 1 may be having an unnecessarily large influence on the analysis. You can see these by going back into the data viewer mode in SPSS and right clicking the new variable name; sorting by descending values will show you the highest Cook’s distances in your sample.