When conducting a multiple linear regression, there are a number of different approaches to entering predictors (i.e., independent variables) into your model. The simplest approach is to enter all of the predictors you have into your model in one step. This is commonly referred to as the “standard” method of regression. Another approach is to enter your predictors in multiple, predetermined steps. This is generally known as “hierarchical” regression and is appropriate when your predictors are divided into meaningful groups. For instance, your predictors might include a few demographic variables (such as gender and age), and personality characteristics (such as extraversion and neuroticism). In this case, it might make sense to enter your demographic variables in one step, and then enter your personality variables in another step. This would allow you see how much variance in your outcome (dependent) variable that the personality characteristics explain above and beyond the demographic variables.
Stepwise regression is a special case of hierarchical regression in which statistical algorithms determine what predictors end up in your model. This approach has three basic variations: forward selection, backward elimination, and stepwise. In forward selection, the model starts with no predictors and successively enters significant predictors until reaching a statistical stopping criteria. In backward elimination, the model starts with all possible predictors and successively removes non-significant predictors until reaching the stopping criteria. Finally, true stepwise regression is a combination of the previous two methods, in which predictors can be added or removed at each step to arrive at the final model.
Stepwise regression may seem convenient, but researchers and statisticians have identified numerous statistical problems with stepwise regression, including overfitting the data, biased estimates, and inflated Type I error (see Harrell, 2015 for a detailed discussion). Statistical pitfalls aside, there are other important limitations to stepwise regression. Most notably, stepwise regression relies on a computer program to pick the variables for you, without any consideration for what they measure or how they fit into the theoretical framework that guides your study. It is usually more appropriate to use theory and previous research to decide what variables are important to include in your model. Ronan Conroy, a biostatistician, once said, “Personally, I would no more let an automatic routine select my model than I would let some best fit procedure pack my suitcase.” In other words, the computer program will just pick the things that fit into the suitcase the best, regardless of what they are or if you need them for your trip.
However, there are situations in which stepwise regression may be appropriate to use. For example, if you have a very large number of potential predictors to include in your model, stepwise regression may be used to reduce the number of predictors. This being said, it is usually better to narrow down the variables in your study based on the specific problem you are investigating and the background literature and theories surrounding the topic. If your research is purely exploratory, and there is no existing theoretical foundation to guide the selection of variables, stepwise regression may be applied as an exploratory analysis.
To conclude, stepwise regression is generally not recommended, especially if your research questions are theoretically driven. If you have a very large number of potential variables to use in your model, try revisiting the literature to narrow your options down. This may ultimately lead you to a more focused study that does not rely on automatic variable selection.
Harrell, F. (2015). Regression modeling strategies: with applications to linear models, logistic and ordinal regression, and survival analysis (2nd ed.). New York, NY: Springer.
We work with graduate students every day and know what it takes to get your research approved.