Checking for Influential Data Points in Regression Analyses

Quantitative Results
Statistical Analysis

Regressions, being an analysis that uses the F test to assess relationships, are quite robust. However, it is important to know if any of your data points might be overly influencing the regression. Overly influential points can shift a regression’s line of best fit either toward or away from a good explanative model, reducing validity. In previous blogs, we’ve discussed testing for outliers, but there are a couple of specific ways to check a data point’s influence on a regression in SPSS that do not have to do with testing either univariate or multivariate outliers.

request a consultation

Discover How We Assist to Edit Your Dissertation Chapters

Aligning theoretical framework, gathering articles, synthesizing gaps, articulating a clear methodology and data plan, and writing about the theoretical and practical implications of your research are part of our comprehensive dissertation editing services.

  • Bring dissertation editing expertise to chapters 1-5 in timely manner.
  • Track all changes, then work with you to bring about scholarly writing.
  • Ongoing support to address committee feedback, reducing revisions.

To check on influential points, three possible methods you can use are scatter plots, partial plots, and Cook’s distances. Simple scatterplots will display the values of each independent variable plotted against the dependent variable. Partial plots are similar to scatterplots between each independent variable in a regression and the dependent variable, but are slightly different from a simple scatterplot in that they control for other variables in your regression. Cook’s distances are values that can be computed to show the degree of influence each participant has on the regression.

Simple scatterplots: To make these in SPSS, you can create a scatterplot for each pair of variables: Graphs -> Legacy Dialogs -> Scatter / Dot, and select “Simple Scatter.” This will open a window where you can select your independent and dependent variables, pair by pair.

Partial plots: To create a partial plot for each independent variable/dependent variable pair, put together your regression like you usually would, and then click on the “Plots…” button in the regression window. Towards the bottom right, select “Produce all partial plots” and click “OK” to run the regression. Once you have your plots, it does not take much statistical experience to see whether there are a few points that seem to be wandering away from the herd, but do not close your regression window yet!

Cook’s distance: Cook’s distance can also be calculated in the regression window once you have put together your regression. Simply click the “Save…” button, and select “Cook’s” – it will be under the “Distances” heading.” This saves a new Cook’s distance variable to your dataset. Any participant with a Cook’s distance value over 1 may be having an unnecessarily large influence on the analysis. You can see these by going back into the data viewer mode in SPSS and right clicking the new variable name; sorting by descending values will show you the highest Cook’s distances in your sample.