# Selection Process for Multiple Regression

The basis of a multiple linear regression is to assess whether one continuous dependent variable can be predicted from a set of independent (or predictor) variables.  Or in other words, how much variance in a continuous dependent variable is explained by a set of predictors.  Certain regression selection approaches are helpful in testing predictors, thereby increasing the efficiency of analysis.

### Entry Method

The standard method of entry is simultaneous (a.k.a. the enter method); all independent variables are entered into the equation at the same time.  This is an appropriate analysis when dealing with a small set of predictors and when the researcher does not know which independent variables will create the best prediction equation.   Each predictor is assessed as though it were entered after all the other independent variables were entered, and assessed by what it offers to the prediction of the dependent variable that is different from the predictions offered by the other variables entered into the model.

### Selection Methods

Selection, on the other hand, allows for the construction of an optimal regression equation along with investigation into specific predictor variables.  The aim of selection is to reduce the set of predictor variables to those that are necessary and account for nearly as much of the variance as is accounted for by the total set.   In essence, selection helps to determine the level of importance of each predictor variable.   It also assists in assessing the effects once the other predictor variables are statistically eliminated.  The circumstances of the study, along with the nature of the research questions guide the selection of predictor variables.

Four selection procedures are used to yield the most appropriate regression equation: forward selection, backward elimination, stepwise selection, and block-wise selection.  The first three of these four procedures are considered statistical regression methods.  Many times researchers use sequential regression (hierarchical or block-wise) entry methods that do not rely upon statistical results for selecting predictors.  Sequential entry allows the researcher greater control of the regression process.  Items are entered in a given order based on theory, logic or practicality, and are appropriate when the researcher has an idea as to which predictors may impact the dependent variable.

Statistical Regression Methods of Entry:

• Forward selection begins with an empty equation.  Predictors are added one at a time beginning with the predictor with the highest correlation with the dependent variable.  Variables of greater theoretical importance are entered first.  Once in the equation, the variable remains there.
• Backward elimination (or backward deletion) is the reverse process.  All the independent variables are entered into the equation first and each one is deleted one at a time if they do not contribute to the regression equation.
• Stepwise selection is considered a variation of the previous two methods.  Stepwise selection involves analysis at each step to determine the contribution of the predictor variable entered previously in the equation.  In this way it is possible to understand the contribution of the previous variables now that another variable has been added.  Variables can be retained or deleted based on their statistical contribution.

Sequential Regression Method of Entry:

• Block-wise selection is a version of forward selection that is achieved in blocks or sets.  The predictors are grouped into blocks based on psychometric consideration or theoretical reasons and a stepwise selection is applied.  Each block is applied separately while the other predictor variables are ignored.  Variables can be removed when they do not contribute to the prediction.  In general, the predictors included in the blocks will be inter-correlated.  Also, the order of entry has an impact on which variables will be selected; those that are entered in the earlier stages have a better chance of being retained than those entered at later stages.

Essentially, the multiple regression selection process enables the researcher to obtain a reduced set of variables from a larger set of predictors, eliminating unnecessary predictors, simplifying data, and enhancing predictive accuracy.  Two criterion are used to achieve the best set of predictors; these include meaningfulness to the situation and statistical significance.  By entering variables into the equation in a given order, confounding variables can be investigated and variables that are highly correlated can be combined into blocks.

Statistics Solutions can assist with your quantitative analysis by assisting you to develop your methodology and results chapters. The services that we offer include:

Data Analysis Plan

• Edit your research questions and null/alternative hypotheses
• Write your data analysis plan; specify specific statistics to address the research questions, the assumptions of the statistics, and justify why they are the appropriate statistics; provide references
• Justify your sample size/power analysis, provide references
• Explain your data analysis plan to you so you are comfortable and confident

Quantitative Results Section (Descriptive Statistics, Bivariate and Multivariate Analyses, Structural Equation Modeling, Path analysis, HLM, Cluster Analysis)

• Clean and code dataset
• Conduct descriptive statistics (i.e., mean, standard deviation, frequency and percent, as appropriate)
• Conduct analyses to examine each of your research questions
• Write-up results
• Provide APA 6th edition tables and figures
• Explain chapter 4 findings
• Ongoing support for entire results chapter statistics

*Please call 877-437-8622 to request a quote based on the specifics of your research, or email Info@StatisticsSolutions.com.

References:

Cohen, J. & Cohen, P. (1983). Applied multiple regression/correlation analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates. View

Cramer, D. (1998). Fundamental statistics for social research: Step by step calculations and computer techniques using SPSS for Windows. New York, NY: Routledge. View

Halinski, R. S. & Feldt, L. S. (1970). The selection of variables in multiple regression analysis. Journal of Educational Measurement, 7 (3). 151-157.

Leech, N. L., Barrett, K. C., & Morgan, G.A. (2008). SPSS for intermediate statistics: Use and interpretation (3rd ed.). Mahwah, NJ: Lawrence Erlbaum Associates, Publishers. View

Pedhazur, E. (1997). Multiple regression in behavioral research: Explanation and prediction (3rd ed.). Orlando, FL: Holt, Rinehart & Winston, Inc.

Stevens, J. P. (2002). Applied multivariate statistics for the social sciences (4th ed.). Mahwah, NJ: Lawrence Erlbaum Associates. View

Tabachnick, B. G. & Fidell, L. S. (2001). Using multivariate statistics (4th ed.). Boston, MA: Allyn and Bacon. View