May 17, 2012

What is Multiple Linear Regression?

Multiple linear regression is the most common form of the regression analysis.  As a predictive analysis multiple linear regression is used to describe data and to explain the relationship between one dependent variable and two or more independent variables.

At the center of the multiple linear regression analysis is the task of fitting a single line through a scatter plot.  More specifically the multiple linear regression fits a line through a multi-dimensional cloud of data points.  The simplest form has one dependent and two independent variables, the general form of the multiple linear regression is defined as

for i = 1…n .

Sometimes the dependent variable is also called endogenous variable, prognostic variable or regressand.  The independent variables are also called exogenous variables, predictor variables or regressors.

However Multiple Linear Regression Analysis consists of more than just fitting a linear line through a cloud of data points.  It consists of 3 stages – (1) analyzing the correlation and directionality of the data, (2) estimating the model, i.e., fitting the line, and (3) evaluating the validity and usefulness of the model.

There are 3 major uses for Multiple Linear Regression Analysis – (1) causal analysis, (2) forecasting an effect, (3) trend forecasting.  Other than correlation analysis, which focuses on the strength of the relationship between two or more variables, regression analysis assumes a dependence or causal relationship between one or more independent and one dependent variable.

Firstly, it might be used to identify the strength of the effect that the independent variables have on a dependent variable.  Typical questions are what is the strength of relationship between dose and effect, sales and marketing spend, age and income.

Secondly, it can be used to forecast effects or impacts of changes.  That is multiple linear regression analysis helps us to understand how much will the dependent variable change, when we change the independent variables.  Typical questions are how much additional Y do I get for one additional unit X.

Thirdly, multiple linear regression analysis predicts trends and future values.  The multiple linear regression analysis can be used to get point estimates.  Typical questions are what will the price for gold be in 6 month from now?  What is the total effort for a task X?

There are several linear regression analyses available to the researcher.

Simple linear regression
1 dependent variable (interval or ratio), 1 independent variable (interval or ratio)

Multiple linear regression
1 dependent variable (interval or ratio), 2+ independent variables (interval or ratio)

Logistic regression
1 dependent variable (binary), 1+ independent variable(s) (interval or ratio)

Ordinal regression
1 dependent variable (ordinal), 1+ independent variable(s) (nominal)

• Multinominal regression
1 dependent variable (nominal), 1+ independent variable(s) (interval or ratio)

Discriminant analysis
1 dependent variable (nominal) ?1+ independent variable(s) (interval or ratio)

When selecting the model for the multiple linear regression analysis another important consideration is the model fit.  Adding independent variables to a multiple linear regression model will always increase its statistical validity, because it will always explain a bit more variance (typically expressed as R²).  However, adding more and more variables to the model makes it inefficient and over fitting occurs.  Occam’s razor applies perfectly to the problem of over fitting – a model should be as simple as possible but not simpler.  Statistically speaking – if the model includes a large number of variables the probability increases that the significance test finds the variables to be significant just by pure chance.

The second concern of multiple linear regression analysis is under fitting.  If the regression analysis’ produces biased estimates the model is under fitted.  Under fitting occurs when including an additional independent variable in the model reduces the strength of the effect of the independent variables on the dependent variable.  Most commonly under fitting happens when linear regression tries to prove a cause-effect relationship that is not there.  This might be due to researcher’s empirical pragmatism or the lack of a sound theoretical basis for the model.

Statistics Solutions can assist with your quantitative analysis by assisting you to develop your methodology and results chapters. The services that we offer include:

Data Analysis Plan

  • Edit your research questions and null/alternative hypotheses
  • Write your data analysis plan; specify specific statistics to address the research questions, the assumptions of the statistics, and justify why they are the appropriate statistics; provide references
  • Justify your sample size/power analysis, provide references
  • Explain your data analysis plan to you so you are comfortable and confident
  • Two hours of additional support with your statistician

Quantitative Results Section (Descriptive Statistics, Bivariate and Multivariate Analyses, Structural Equation Modeling, Path analysis, HLM, Cluster Analysis)

  • Clean and code dataset
  • Conduct descriptive statistics (i.e., mean, standard deviation, frequency and percent, as appropriate)
  • Conduct analyses to examine each of your research questions
  • Write-up results
  • Provide APA 6th edition tables and figures
  • Explain chapter 4 findings
  • Ongoing support for entire results chapter statistics

*Please call 877-437-8622 to request a quote based on the specifics of your research, or email Info@StatisticsSolutions.com.

Related Pages:

Conduct and Interpret a Multiple Linear Regression

Assumptions of Multiple Linear Regression