# Conduct and Interpret a Wilcoxon Sign Test

**What is the Wilcoxon Sign Test?**

The Wilcoxon Sign test is a statistical comparison of the average of two dependent samples. The Wilcoxon sign test is a sibling of the t-tests. It is, in fact, a non-paracontinuous level alternative to the dependent samples t-test. Thus the Wilcoxon signed rank test is used in similar situations as the Mann-Whitney U-test. The main difference is that the Mann-Whitney U-test tests two independent samples, whereas the Wilcox sign test tests two dependent samples.

The Wilcoxon Sign test is a test of dependency. All dependence tests assume that the variables in the analysis can be split into independent and dependent variables. A dependence tests that compares the averages of an independent and a dependent variable assumes that differences in the average of the dependent variable are caused by the independent variable. Sometimes the independent variable is also called factor because the factor splits the sample in two or more groups, also called factor steps.

Dependence tests analyze whether there is a significant difference between the factor levels. The t-test family uses mean scores as the average to compare the differences, the Mann-Whitney U-test uses mean ranks as the average, and the Wilcoxon Sign test uses signed ranks.

Unlike the t-test and F-test the Wilcoxon sign test is a non-paracontinuous-level test. That means that the test does not assume any properties regarding the distribution of the underlying variables in the analysis. This makes the Wilcoxon sign test the analysis to conduct when analyzing variables of ordinal scale or variables that are not multivariate normal.

The Wilcoxon sign test is mathematically similar to conducting a Mann-Whitney U-test (which is sometimes also called Wilcoxon 2-sample t-test). It is also similar to the basic principle of the dependent samples t-test, because just like the dependent samples t-test the Wilcoxon sign test, tests the difference of observations.

However, the Wilcoxon signed rank test pools all differences, ranks them and applies a negative sign to all the ranks where the difference between the two observations is negative. This is called the signed rank. The Wilcoxon signed rank test is a non-paracontinuous-level test, in contrast to the dependent samples t-tests. Whereas the dependent samples t-test tests whether the average difference between two observations is 0, the Wilcoxon test tests whether the difference between two observations has a mean signed rank of 0. Thus it is much more robust against outliers and heavy tail distributions. Because the Wilcoxon sign test is a non-paracontinuous-level test it does not require a special distribution of the dependent variable in the analysis. Therefore it is the best test to compare mean scores when the dependent variable is not normally distributed and at least of ordinal scale.

For the test of significance of Wilcoxon signed rank test it is assumed that with at least ten paired observations the distribution of the W-value approximates a normal distribution. Thus we can normalize the empirical W-statistics and compare this to the tabulated z-ratio of the normal distribution to calculate the confidence level.

*The Wilcox Sign Test in SPSS*

Our research question for the Wilcoxon Sign Test is as follows:

*Does the before-after measurement of the first and the last mid-term exam differ between the students who have been taught in a blended learning course and the students who were taught in a standard classroom setting? *

We only measured the outcome of the mid-term exam on an ordinal scale (grade A to F); therefore a dependent samples t-test cannot be used. This is such because the distribution is only binominal and we do not assume that it approximates a normal distribution. Also both measurements are not independent from each other and therefore we cannot use the Mann-Whitney U-test.

The Wilcoxon sign test can be found in *Analyze/Nonparacontinuous-level Tests/Legacy Dialog/2 Related Samples…*

In the next dialog box for the nonparacontinuous-level two dependent samples tests we need to define the paired observations. We enter *‘Grade on Mid-Term Exam 1’* as variable 1 of the first pair and *‘Grade on Mid-Term Exam 2’* as Variable 2 of the first pair. We also need to select the Test Type. The Wilcoxon Signed Rank Test is marked by default. Alternatively we could choose Sign, McNamar, or Marginal Homogeneity.

**Wilcoxon** – The Wilcoxon signed rank test has the null hypothesis that both samples are from the same population. The Wilcoxon test creates a pooled ranking of all observed differences between the two dependent measurements. It uses the standard normal distributed z-value to test of significance.

**Sign** – The sign test has the null hypothesis that both samples are from the same population. The sign test compares the two dependent observations and counts the number of negative and positive differences. It uses the standard normal distributed z-value to test of significance.

**McNemar** – The McNemar test has the null hypothesis that differences in both samples are equal for both directions. The test uses dichotomous (binary) variables to test whether the observed differences in a 2x2 matrix including all 4 possible combinations differ significantly from the expected count. It uses a Chi-Square test of significance.

**Marginal Homogeneity** – The marginal homogeneity test has the null hypothesis that the differences in both samples are equal in both directions. The test is similar to the McNemar test, but it uses nominal variables with more than two levels. It tests whether the observed differences in a n*m matrix including all possible combinations differ significantly from the expected count. It uses a Chi-Square test of significance.

If the values in the sample are not already ranked, SPSS will sort the observations according to the test variable and assign ranks to each observation, correcting for tied observations. The dialog box *Exact…* allows us to specify an exact test of significance and the dialog box *Options…* defines how missing values are managed and if SPSS should output additional descriptive statistics.