How to Select the Appropriate Statistical Analysis

Posted June 12, 2014

How do doctoral students select the appropriate statistical analysis and power analysis for their methodology selection?

Selecting the appropriate statistical analysis and sample size is a very common problem for graduate students.  Here is the strategy we use at Statistics Solutions: select the correct test, and then select the correct sample size for that test.

First, you have to define the level of measurement of each variable to be included in the analysis.  Your variables will be categorical or nominal, ordinal or rank-ordered, interval, or ratio-level.  This needs to be done for both your independent and dependent variables.  A nominal level variable is a variable where the categories just have names, such as male/female, race, Democrat/Republican/Independent, or whether participants are in the control or experimental group.  Rank-ordered data is data that is ordered, like a horse race: first, second, third, etc.; the main point is that the distance between first and second and between second and third are different from each other.  When the distance between units is the same, you have interval data.  Interval-level data have equally spaced units, such as a Likert type scale from 1-7 with 1 equal to strongly disagree and 7 equal to strongly agree.  Ratio-level data are similar to interval level data, except that the data have a zero point in it, like age, time, or amounts.

Second, to select the correct statistical analysis, you have to clarify what you want to find out.  The research question or hypothesis is typically phrased in terms of finding differences, relationships, or predicting. “Difference-type” questions have interval or ratio-level Y variables, and categorical-level  X variables, phrased as, “Are there difference on the variable Y by variable X?” or, "Are there differences  on variables Y1, Y2, and Y3 by variables X1 and X2?"  The appropriate statistical analyses for these questions are ANOVA and MANOVA, respectively.

For relationship questions with interval, ordinal-level, or ratio-level variables, the correct statistical analysis is typically Spearman or Pearson correlations.  The point-biserial correlation is the statistical analysis to use when examining the relationships between a dichotomous, categorical variable and an interval or ratio-level variable.  Relationship questions with two categorical variables can be examined with a chi-square test.

Typically, linear, ordinal, or multinomial regressions are the appropriate statistical analyses to use when the outcome variables are interval, ordinal, or categorical-level variables, respectively.  The independent variables can be interval/ordinal level variables or categorical-level variables.  Be careful: when the categorical-level variable has more than two levels (e.g., political affiliation), the variable has to be dummy coded (we can assist you with dummy coding the variables).

Third,sample size calculation or power analysis is directly related to the statistical test that is chosen.  The sample size calculation is based on the power (typically .80 is desired), the effect size (typically a medium or large effect are selected; contrary to what one might expect, the larger the effect, the smaller a sample is needed), and the alpha (e.g., .05).  Given these criteria, the basic rules of thumb are 26 (for a large effect size) and 65 (for a moderate effect size) participants per group for a t-test, chi-square, correlation or linear regression with two predictors.

For more information about the appropriate statistical analysis or sample size calculations, please go to our contact page.

Pin It on Pinterest

Share This