Statistical Methods for HCI Research

Site Tools

hcistats:kruskalwallis

Kruskal-Wallis and Friedman test

Introduction

Kruskal-Wallis is basically a non-parametric version of ANOVA. Thus, if you have the data which contain more than two groups to compare, and your data are ordinal or your data cannot assume the normality, Kruskal-Wallis is the way to go. Fortunately, in R, the way to run a Kruskal-Wallis test is similar to the way to run an ANOVA test.

There is also a non-parametric version of repeated-measure ANOVA, which is called Friedman test. If you are comparing the data against a between-subject factor, you can use a Kruskal-Wallis test. Otherwise, you need to use a Friedman test.

As you can see in the example below, Kruskal-Wallis and Friedman tests only support a one-way analysis. This means that you can compare the data only across one factor and unfortunately, you cannot easily extend these tests to two-way or mixed-design as we can do with ANOVA. This is one of the costs you have to pay when you have to use a non-parametric test. I will post non-parametric methods equivalent to two-way repeated-measure or mixed-design ANOVA once I figure them out.

Effect size

Unfortunately, there is no straight way to calculate the effect size of Kruskal Wallis test or Friedman test. Instead, you can calculate the effect size in the post-hoc test with Mann-Whitney or Wilcoxon test. The effect size of those tests are calculated as follows:

$\Large r = \frac{Z}{\sqrt{N}}$,

where N is the total number of the samples. Here is the standard value of r for small, medium, and large sizes.

small sizemedium sizelarge size
abs(r)0.10.30.5

Kruskal-Wallis and Friedman tests give you a chi-squared. However, its degree of freedom is more than 1, and thus it is not straightforward to convert the chi-squared into the effect size. Thus, we calculate the effect size for the post-hoc comparison.

R code example (Kruskal-Wallis)

First, you need to prepare the data.

Value <- c(1,2,5,3,2,1,1,3,2,1,4,3,6,5,2,6,1,6,5,4,9,6,7,7,5,1,8,9,6,5) Group <- factor(c(rep(1,10),rep(2,10),rep(3,10))) data <- data.frame(Group, Value)

Then, run a Kruskal-Wallis test. The format is pretty much the same as ANOVA.

kruskal.test(Value ~ Group, data=data)

Now you get the result.

Kruskal-Wallis rank sum test data: Value by Group Kruskal-Wallis chi-squared = 13.6754, df = 2, p-value = 0.001073

So, we have a significant effect of Group.

R code example (Friedman)

Let's use the same data as in the example of a Kruskal-Wallis test. But we need to format the data so that Group represents a within-subject factor.

data2 <- cbind(data[data$Group==1,]$Value, data[data$Group==2,]$Value, data[data$Group==3,]$Value) [,1] [,2] [,3] [1,] 1 4 9 [2,] 2 3 6 [3,] 5 6 7 [4,] 3 5 7 [5,] 2 2 5 [6,] 1 6 1 [7,] 1 1 8 [8,] 3 6 9 [9,] 2 5 6 [10,] 1 4 5

The rows represent the values from each participant. Now, you run a Friedman“s test.

friedman.test(data2) Friedman rank sum test data: data2 Friedman chi-squared = 15.6216, df = 2, p-value = 0.0004053

You have a significant effect of Group.

Post-hoc test

Similar to ANOVA, you need to do a post-hoc test after Kruskal-Wallis and Friedman if you find a significant effect. As we do multiple t tests with Bonferroni or Holm correction, we can do Mann-Whitney or Wilcoxon tests with the same corrections. If you have unpaired (factorial, or non-repeated-measure) data, you can do pairwise comparison with a Mann-Whitney test. First, take a look at how to do pairwise comparison with a Mann-Whitney test with Bonferroni correction.

pairwise.wilcox.test(Value, Group, p.adj="bonferroni", exact=F) Pairwise comparisons using Wilcoxon rank sum test data: Value and Group 1 2 2 0.0418 - 3 0.0058 0.0791 P value adjustment method: bonferroni

If you have any tied value in your data, you can only calculate an approximate p value. (You can try to calculate the exact p value by changing the option to “exact=T”, but will get warnings if you have ties.) Unfortunately, it seems there is no way to get this around, but in many cases, this approximate p value is close enough to the exact p value, and will not cause a lot of troubles. But if you really want to be very precise, you can calculate the exact p value by using functions in coin package, and manually apply Bonferroni corrections to the p values you have gained by using p.adjust() function. For doing this, see the Mann-Whitney test or the Wilcoxon test.

Instead of Bonferroni correction, you can also use Holm correction.

pairwise.wilcox.test(Value, Group, p.adj="holm", exact=F) Pairwise comparisons using Wilcoxon rank sum test data: Value and Group 1 2 2 0.0279 - 3 0.0058 0.0279 P value adjustment method: holm

If you have a within-factor (or repeated-measure data), you need to specify paired so that you will use a Wilcoxon test.

pairwise.wilcox.test(Value, Group, p.adj="bonferroni", exact=F, paired=T) Pairwise comparisons using Wilcoxon rank sum test data: Value and Group 1 2 2 0.039 - 3 0.026 0.174 P value adjustment method: bonferroni

For calculating the effect size, you have to do a Mann-Whitney or Wilcoxon test to find the Z value (which is necessary for the calculation of the effect size). In this example, we are going to use a Mann-Whitney test. If you need to use a Wilcoxon test (i.e., you have a within-subject factor), see this page.

library(coin) wilcox_test(Value ~ factor(Group), data=data[data$Group==1|data$Group==2,], distribution="exact") Exact Wilcoxon Mann-Whitney Rank Sum Test data: Value by factor(Group) (1, 2) Z = -2.4975, p-value = 0.01177 alternative hypothesis: true mu is not equal to 0

Thus, the effect size is:

2.4975 / sqrt(20) 0.558458

Remember that you have to use the total sample size (10 * 2 = 20) to calculate the effect size r.

How to report

You can report the results of Kruskal Wallis or Friedman test as follows: A Kruskal Wallis test revealed a significant effect of Group on Value ($\small \chi^{2}$(2)=13.7, p < 0.01). A post-hoc test using Mann-Whitney tests with Bonferroni correction showed the significant differences between Group A and B (p < 0.05, r = 0.56) and between Group A and C (p < 0.01, r = 0.70).

, 2014/10/07 20:58
I have to say that I found this to be very, very helpful! I am trying to write up my results (as well as figure out some follow-up analyses), and your paragraph on "How to report" is extremely useful.

And to demonstrate the R code as well, well that is a lifesaver!

Thanks again!