What is Linear Regression?

Linear regression is the most basic type of regression and commonly used predictive analysis.  The overall idea of regression is to examine two things: (1) does a set of predictor variables do a good job in predicting an outcome variable?  Is the model using the predictors accounting for the variability in the changes in the dependent variable? (2) Which variables in particular are significant predictors of the dependent variable?  And in what way do they--indicated by the magnitude and sign of the beta estimates--impact the dependent variable?  These regression estimates are used to explain the relationship between one dependent variable and one or more independent variables. (3) What is the regression equation that shows how the set of predictor variables can be used to predict the outcome?  The simplest form of the equation with one dependent and one independent variable is defined by the formula y = c + b*x, where y = estimated dependent score, c = constant, b = regression coefficients, and x = independent variable.

The software below allows you to conduct a regression, then interprets the regression's assumptions and output.

Naming the Variables.  There are many names for a regression's dependent variable.  It is called a criterion variable, endogenous variable, prognostic variable, or regressand.   The independent variable can be called an exogenous variables, predictor variables or regressors.

More about the uses of regression.  Three major uses for regression analysis are (1) causal analysis, (2) forecasting an effect, and (3) trend forecasting.  Other than correlation analysis, which focuses on the strength of the relationship between two or more variables, regression analysis assumes a dependence or causal relationship between one or more independent variables and one dependent variable.

Firstly, the regression might be used to identify the strength of the effect that the independent variable(s) have on a dependent variable.  Typical questions are what is the strength of relationship between dose and effect, sales and marketing spend, age and income.

Secondly, it can be used to forecast effects or impact of changes.  That is, the regression analysis helps us to understand how much the dependent variable change with a change in one or more independent variables.  Typical questions are, "how much additional Y do I get for one additional unit X?"

Thirdly, regression analysis predicts trends and future values.  The regression analysis can be used to get point estimates.  Typical questions are, "what will the price for gold be in 6 month from now?"  "What is the total effort for a task X?"

There are several linear regression analyses available to the researcher.

• Simple linear regression
1 dependent variable (interval or ratio), 1 independent variable (interval or ratio or dichotomous)

Multiple linear regression
1 dependent variable (interval or ratio) , 2+ independent variables (interval or ratio or dichotomous)

Logistic regression
1 dependent variable (binary), 2+ independent variable(s) (interval or ratio or dichotomous)

Ordinal regression
1 dependent variable (ordinal), 1+ independent variable(s) (nominal or dichotomous)

Multinominal regression
1 dependent variable (nominal), 1+ independent variable(s) (interval or ratio or dichotomous)

Discriminant analysis
1 dependent variable (nominal), 1+ independent variable(s) (interval or ratio)

When selecting the model for the analysis, another important consideration is the model fitting.  Adding independent variables to a linear regression model will always increase the explained variance of the model (typically expressed as R²).  However, adding more and more variables to the model makes it inefficient and overfitting can occur.  Occam's razor describes the problem extremely well – a model should be as simple as possible but not simpler.  Statistically, if the model includes a large number of variables, the probability increases that the variables will be statistically significant from random effects.

The second concern of regression analysis is under fitting.  This means that the regression analysis' estimates are biased.  Under fitting occurs when including an additional independent variable in the model will reduce the effect strength of the independent variable(s).  Mostly under fitting happens when linear regression is used to prove a cause-effect relationship that is not there.  This might be due to researcher's empirical pragmatism or the lack of a sound theoretical basis for the model.

Statistics Solutions can assist with your quantitative analysis by editing your methodology and results chapters.  For more information on how we can assist, please click here.

Related Pages:

Assumptions of a Linear Regression


To Reference this Page: Statistics Solutions. (2013). What is Linear Regression [WWW Document]. Retrieved from here.