How do doctoral students select the appropriate statistical analysis and power analysis for their methodology selection?
Selecting the appropriate statistical analysis and sample size is a very common problem for graduate students. Here is the strategy we use at Statistics Solutions: select the correct test. And then select the correct sample size for that test.
First, you must define the level of measurement for each variable to include in the analysis. Your variables will be categorical or nominal, ordinal or rank-ordered, interval, or ratio-level. You need to do this for both your independent and dependent variables.. A nominal-level variable is a variable where the categories have names, such as male/female, race, Democrat/Republican/Independent, or whether participants are in the control or experimental group. Rank-ordered data is data that is arranged in a specific order, like a horse race: first, second, third, etc. The main point is that the distance between first and second, and between second and third, can vary.
When the distance between units is the same, you have interval data. Interval-level data have equally spaced units, such as a Likert type scale from 1-7 with 1 equal to strongly disagree and 7 equal to strongly agree. Ratio-level data are similar to interval level data, except that the data have a zero point in it, like age, time, or amounts.
The next step to select the appropriate statistical analysis is clarifying what you want to find out. Researchers typically phrase the research question or hypothesis in terms of finding differences, relationships, or predictions. “Difference-type” questions involve interval or ratio-level Y variables and categorical-level X variables. These questions are phrased as, “Are there differences on variable Y by variable X?” or, “Are there differences on variables Y1, Y2, and Y3 by variables X1 and X2?” The appropriate statistical analyses for these questions are ANOVA for the first and MANOVA for the second.
For relationship questions with interval, ordinal-level, or ratio-level variables, the correct statistical analysis is typically Spearman or Pearson correlations. To examine the relationship between a dichotomous categorical variable and an interval or ratio-level variable, researchers use the point-biserial correlation. You can examine relationship questions with two categorical variables using a chi-square test.
Typically, linear, ordinal, or multinomial regressions are the appropriate analyses to use when the outcome variables are interval, ordinal, or categorical-level variables, respectively. The independent variables can be interval/ordinal level variables or categorical-level variables. Be careful: when the categorical-level variable has more than two levels (e.g., political affiliation), you must dummy code the variable. We can assist you with dummy coding the variables.
Third, you calculate the sample size, or power analysis, based on the statistical test chosen. The calculation depends on power (typically .80 is desired), effect size (with medium or large effects usually selected; surprisingly, a larger effect requires a smaller sample), and alpha (e.g., .05). Given these criteria, the basic rules of thumb are 26 participants per group for a large effect size and 65 participants per group for a moderate effect size. These guidelines apply to a t-test, chi-square, correlation, or linear regression with two predictors.
For more information about the appropriate statistical analysis or sample size calculations, please go to our contact page.