May 22, 2012

Conduct and Interpret a Repeated Measures ANOVA

What is the Repeated Measures ANOVA?

The repeated measures ANOVA is a member of the ANOVA family.  ANOVA is short for ANalysis Of VAriance.  All ANOVAs compare one or more mean scores with each other; they are tests for the difference in mean scores.  The repeated measures ANOVA compares means across one or more variables that are based on repeated observations.  A repeated measures ANOVA model can also include zero or more independent variables.  Again, a repeated measures ANOVA has at least 1 dependent variable that has more than one observation.

Example:

A research team wants to test the user acceptance of a new online travel booking tool.  The team conducts a study where they assign 30 randomly chosen people into two groups.  One group uses the new system and another group acts as a control group and books its travel via phone.  The team measures the user acceptance of the system as the behavioral intention to use the system in the first 4 weeks after it went live.  Since user acceptance is a latent behavioral construct the researchers measure it with three items – ease of use, perceived usefulness, and effort to use.

The repeated measures ANOVA is an ‘analysis of dependencies’.  It is referred to as such because it is a test to prove an assumed cause-effect relationship between the independent variable(s), if any, and the dependent variable(s).

When faced with a question similar to the one in our example, you could also try to run 4 MANOVAs, testing the influence of the independent variables on each of the observations of the four weeks.  Running multiple ANOVAs, however, does not account for individual differences in baselines of the participants of the study.

The repeated measures ANOVA is similar to the dependent sample T-Test, because it also compares the mean scores of one group to another group on different observations.  It is necessary for the repeated measures ANOVA for the cases in one observation to be directly linked with the cases in all other observations.  This automatically happens when repeated measures are taken, or when analyzing similar units or comparable specimen.

The pairing of observations or making repeated measurements are very common when conducting experiments or making observations with time lags.  Pairing the measured data points is typically done in order to exclude any cofounding or hidden factors (cf.  partial correlation).  It is also often used to account for individual differences in the baselines, such as pre-existing conditions in clinical research.  Consider for example a drug trial where the participants have individual differences that might have an impact on the outcome of the trial.  The typical drug trial splits all participants into a control and the treatment group and measures the effect of the drug in month 1 -18.  The repeated measures ANOVA can correct for the individual differences or baselines.  The baseline differences that might have an effect on the outcome could be typical parameter like blood pressure, age, or gender.  Thus the repeated measures ANOVA analyzes the effect of the drug while excluding the influence of different baseline levels of health when the trial began.

Since the pairing is explicitly defined and thus new information added to the data, paired data can always be analyzed with a regular ANOVA as well, but not vice versa.  The baseline differences, however, will not be accounted for.

A typical guideline to determine whether the repeated measures ANOVA is the right test is to answer the following three questions:

  • Is there a direct relationship between each pair of observations, e.g., before vs.  after scores on the same subject?
  • Are the observations of the data points definitely not random (i.e., they must not be a randomly selected specimen of the same population)?
  • Do all observations have to have the same number of data points?

If the answer is yes to all three of these questions the dependent sample t-test is the right test.  If not, use the ANOVA or the t-test.  In statistical terms the repeated measures ANOVA requires that the within-group variation, which is a source of measurement errors, can be identified and excluded from the analysis.

The Repeated Measures ANOVA in SPSS

Let us return to our aptitude test question in consideration of the repeated measures ANOVA.  The question being: “Is there a difference between the five repeated aptitude tests between students who passed the exam and the students who failed the exam?” Since we ran the aptitude tests multiple times with the students these are considered repeated measurements.  The repeated measures ANOVA uses the GLM module of SPSS, like the factorial ANOVAs, MANOVAs, and MANCOVAs.

The repeated measures ANOVA can be found in SPSS in the menu Analyze/General Linear Model/Repeated Measures…

The dialog box that opens on the click is different than the GLM module you might know from the MANOVA.  Before specifying the model we need to group the repeated measures.

We specify the repeated measures by creating a within-subject factor.  It is called within-subject factor of our repeated measures ANOVA because it represents the different observations of one subject (so the measures are made within one single case).  We measured the aptitude on five longitudinal data points.  Therefore we have five levels of the within-subject factor.  If we just want to test whether the data differs significantly over time we are done after we created and added the factor Aptitude_Tests(5).

The next dialog box allows us to specify the repeated measures ANOVA.  First we need to add the five observation points to the within-subject variables simply select the five aptitude test points and click on the arrow pointing towards the list of within-subject variables.  In a more complex example we could also include additional dependent variables into the analysis.  Plus we can add treatment/grouping variables to the repeated measures ANOVA, in such a case the grouping variable would be added as a between-subject factor.

Since our example does not have an independent variable the post hoc tests and contrasts are not needed to compare individual differences between levels of the between-subject factor.  We also go with the default option of the full factorial model (in the Model… dialog box).  If you were to conduct a post hoc test, SPSS would run a couple of pairwise dependent samples t-tests.  We only add some useful statistics to the repeated measures ANOVA output in the Options… dialog.

Technically we only need the Levene test for homoscedasticity when we would include at least one independent variable in the sample.  However it is checked here out of habit so that we don’t forget to select it for the other GLM procedures we run.

It is also quite useful to include the descriptive statistics, because we have not yet compared the longitudinal development of the five administered aptitude tests.

The Output of the Repeated Measures ANOVA

The first table just lists the design of the within-subject factor in our repeated measures ANOVA.  The second table lists the descriptive statistics for the five tests.  We find that there is little movement within the test scores, the second test scoring lower and then the numbers picking up again.

The next table shows the results of the regression modeling the GLM procedure conducts.  Since our rather simple example of a simple repeated measures ANOVA does not include any regression, component we can skip this table.

One of the key assumptions of the repeated measures ANOVA is sphericity.  Sphericity is a measure for the structure of the covariance matrix in repeated designs.  Because repeated designs violate the assumption of independence between measurements, the covariances need to be spherical.  One stricter form of sphericity is compound symmetry, which occurs if all the covariances are approximately equal and all the variances are approximately equal in the samples.  Mauchly’s Sphericity Test tests this assumption.  If there is no sphericity in the data the repeated measures ANOVA can still be done when the F-values are corrected by deducting additional degrees of freedom, e.g., Greenhouse-Geisser or Huynh-Feldt.

Mauchly’s Test tests the null hypothesis that the error covariance of the orthonormalized transformed dependent variable is proportional to an identity matrix.  In other words, the relationship between different observation points is similar; the differences between the observations have equal variances.  This assumption is similar to homoscedasticity (tested by the Levene Test) which assumes equal variances between groups, not observations.  In our example, the assumption of sphericity has not been met because the Mauchly’s Test is significant.  This means that the F-values of our repeated measures ANOVA are likely to be too large.  This can be corrected by decreasing the degrees of freedom used.  The last three columns (Epsilon) tell us the appropriate correction method to use.  If epsilon is greater than 0.75 then we should use the Huynh-Feldt correction or the Greenhouse-Geisser correction.  SPSS automatically corrects the F-values in the f-statistics table of the repeated measures ANOVA.

The next table shows the f statistics (also called the within-subject effects).  As discussed earlier the assumption of sphericity has not been met and thus the degrees of freedoms in our repeated measures ANOVA need to be decreased.  The table shows that the differences in our repeated measures are significant on a level p < 0.001.  The table also shows that the Greenhouse-Geisser correction has decreased the degrees of freedom from 4 to 2.831.

Thus we can reject the null hypothesis that the repeated measures are equal and we might assume that our repeated measures are different from each other.  Since the repeated measures ANOVA only conducts a global F-test the pairwise comparison table helps us find the significant differences in the observations.  Here we find that the first aptitude test is significantly different than the second and third; the second is only significantly different from the first, the fourth, and the fifth etc.

In summary, a possible write-up could be:
During the fieldwork, five repeated aptitude tests were administered to the students.  The repeated measures ANOVA shows that achieved scores on these aptitude tests are significantly different.  A pairwise comparison identifies the aptitude tests 1, 2, 3 being significantly different from each other; also the tests 2 and 3 are significantly different from 4 and 5.

Syntax

GLM Apt1 Apt2 Apt3 Apt4 Apt5

/WSFACTOR=Aptitude_Tests 5 Polynomial

/METHOD=SSTYPE(3)

/EMMEANS=TABLES(Aptitude_Tests) COMPARE ADJ(BONFERRONI)

/PRINT=DESCRIPTIVE ETASQ HOMOGENEITY

/CRITERIA=ALPHA(.05)

/WSDESIGN=Aptitude_Tests.