# Best Subsets Regression

Posted September 14, 2012

Best subsets regression is an exploratory model building regression analysis.  It compares all possible models that can be created based upon an identified set of predictors.  The results presented for best subsets, by default in Minitab, show the two best models for one predictor, two predictors, three predictors, and so on for the number of possible predictors that were entered into the best subsets regression.  The output in Minitab presents R2, adjusted R2, Mallow’s Cp, and S.  To determine the best model, these model fit statistics will be used in conjunction with one another.  R2and adjusted R2measure the coefficient of multiple determination and are used to determine the amount of predictability of the criterion variable based upon the set of predictor variables.   Mallow’s Cp is a measure of bias or prediction error.  Sis the square root of the mean square error (MSE).
The decision is not always clear so the researcher must use all the tools available to make the most informed choice.  When selecting the best subset, we are looking for the highest adjusted R2.  Every increase in the number of predictors will cause an increase in the R2value, therefore, when selecting among different numbers of predictors it is more reasonable to use the adjusted  R2, as the adjusted R2 increases only if the added predictors improve the model more than chance aloneIn regards to Mallow’s Cp, where p indicates the number of parameters in the model, we are looking for a value equal to or less than p.  The number of parameters in each model is equal to the number of predictors plus one, where the one is the intercept parameter.  So if our output reads two variables, we know that the number of parameters in the model is equal to three.   There are a few things to note when analyzing Mallow’s Cp:
·                     The model with the maximum number of predictors always shows Cp = p so Mallow’s Cp is not a good selection tool for the full model.
·                     If all models but the full model display a large Cp then the models are lacking important predictors that must be identified before going forward.
·                     When several models show a Cp near p, then the model with the smallest Cp should be selected to be certain the bias is small.
·                     Further, when several models show a Cp near p, then the model with the fewest number of predictors should be selected.
In addition to these guidelines, we are also looking for the model with the smallest S.  Taking these factors into account should allow the research to select the most appropriate, best fitting regression model.