# Conduct and Interpret a Linear Regression

*What is Linear Regression?*

Linear regression is the most basic and commonly used predictive analysis. Regression estimates are used to describe data and to explain the relationship between one dependent variable and one or more independent variables. At the center of the regression analysis is the task of fitting a single line through a scatter plot. The simplest form with one dependent and one independent variable is defined by the formula *y = a + b*x. *

Sometimes the dependent variable is also called endogenous variable, prognostic variable or regressand. The independent variables are also called exogenous variables, predictor variables or regressors. However Linear Regression Analysis consists of more than just fitting a linear line through a cloud of data points. It consists of 3 stages: 1) analyzing the correlation and directionality of the data, 2) estimating the model, i.e., fitting the line, and 3) evaluating the validity and usefulness of the model.

There are three major uses for Regression Analysis: 1) causal analysis, 2) forecasting an effect, 3) trend forecasting. Other than correlation analysis, which focuses on the strength of the relationship between two or more variables, regression analysis assumes a dependence or causal relationship between one or more independent and one dependent variable.

Firstly, it might be used to identify the strength of the effect that the independent variable(s) have on a dependent variable. Typical questions are what is the strength of relationship between dose and effect, sales and marketing spending, age and income.

Secondly, it can be used to forecast effects or impacts of changes. That is, regression analysis helps us to understand how much the dependent variable will change when we change one or more independent variables. Typical questions are, “*How much additional Y do I get for one additional unit of X*?”.

Thirdly, regression analysis predicts trends and future values. The regression analysis can be used to get point estimates. Typical questions are, “*What will the price for gold be 6 month from now?*” “*What is the total effort for a task X?“
*

*The Linear Regression in SPSS*

The research question for the Linear Regression Analysis is as follows:

*In our sample of 107 students can we predict the standardized test score of reading when we know the standardized test score of writing? *

The first step is to check whether there is a linear relationship in the data. For that we check the scatter plot (*Graphs/Chart Builder…*). The scatter plot indicates a good linear relationship, which allows us to conduct a linear regression analysis. We can also check the Pearson's Bivariate Correlation (*Analyze/Correlate/Bivariate…*) and find that both variables are strongly correlated (r = .645 with p < 0.001).

Secondly, we need to check for multivariate normality. We have a look at the Q-Q-Plots (*Analyze/Descriptive statistics/Q-Q-Plots…*) for both of our variables and see that they are not perfect, but it might be close enough.

We can check our ‘eyeball’ test with the 1-Sample Kolmogorov-Smirnov test (*Analyze/Non Paracontinuous-level Tests/Legacy Dialogs/1-Sample K-S…*). The test has the null hypothesis that the variable approximates a normal distribution. The results confirm that reading score can be assumed to be multivariate normal (p = 0.474) while the writing test is not (p = 0.044). To fix this problem we could try to transform the writing test scores using a non-linear transformation (e.g., log). However, we do have a fairly large sample in which case the linear regression is quite robust against violations of normality. It may report too optimistic T-values and F-values.

We now can conduct the linear regression analysis. Linear regression is found in SPSS in *Analyze/Regression/Linear…*

To answer our simple research question we just need to add the Math Test Score as the dependent variable and the Writing Test Score as the independent variable. The menu *Statistics… *allows us to include additional information that we need to assess the validity of our linear regression analysis. In order to assess autocorrelation (especially if we have time series data) we add the Durbin-Watson Test, and to check for multicollinearity we add the Collinearity diagnostics.

Lastly, we click on the menu *Plots…* to add the standardized residual plots to the output. The standardized residual plots chart ZPRED on the x-axis and ZRESID on the y-axis. This standardized plot allows us to check for heteroscedasticity.

We leave all the options in the menus *Save…* and *Options… *as they are and are now ready to run the test.