Table of Contents

Mann-Whitney's U test


Introduction

A Mann-Whitney's U test is also known as Wilcoxon Rank sum test, and basically a non-parametric version of t test. You want to use a Mann-Whitney's U test when

So, you see people use a Mann-Whitney' U test when they have ordinal dependent variables or when they have only a small sample size (and thus they cannot assume the normality). However, Mann-Whitney' U test still assumes the equality of variances.

Although a Mann-Whitney's U test can be considered as a non-parametric version of t test, a Mann-Whitney's U test compares the medians of the two groups, not the means.


What does Mann-Whitney do?

Before looking at the example of a Mann-Whitney's U test, let's take a look at what a Mann-Whitney's U test does. The point of a Mann-Whitney's U test is that it treats the data as ordinal data. So you can order the data but the difference between any of the two values is not consistent. What a Mann-Whitney's U test does is to calculate the rank for each value instead of using the values as-is. Let's think about some data from a 5-Likert scale question and say you have the following data.

Group A13242
Group B35524

Then, you make a rank (R) based on these values. So,

Group A1 (R1)3 (R6)2 (R2)4 (R7)2 (R4)
Group B3 (R5)5 (R9)5 (R10)2 (R3)4 (R8)

For now, I just randomly ranked for the ties. But obviously this may cause a problem if we want to do a fair statistical test. One thing we can do is to take the average of the ranks of the ties and give them the same average. For instance, the value 2 gets rank 2 and 3 in this example. Instead of deciding which data point gets a higher rank, we just use the average of the ranks that value gets. So, both will get rank 2.5 in this case. Thus, with this correction, this example becomes

Group A1 (R1)3 (R5.5)2 (R3)4 (R7.5)2 (R3)
Group B3 (R5.5)5 (R9.5)5 (R9.5)2 (R3)4 (R7.5)

The means of the ranks of Group A and Group B are 4.0 and 7.0. The null hypothesis of a Mann-Whitney's U test is that the samples of the both groups came from the same population. So intuitively, if the null hypothesis holds, this means that there is no difference in the mean ranks between the two groups because both groups have the same chances to have low and high ranks. Thus, if the means of the ranks are skewed enough, you can say that you have a significant effect.

Please remember that this is not what exactly a Mann-Whitney test does. It calculates the statistics called the U value. The U value for each group is calculated by subtracting the possible minimum rank which the group can take from the sum of the ranks, and the smallest U value is used for the test. The distribution of the standardized U value is known to be close to the normal distribution when the sample size is more than 20. Thus, if the observed standardized U value is far from the center of the normal distribution (= 0), the test will reject the null hypothesis.


Effect size

The calculation of the effect size of Mann-Whitney's U test is fairly easy.

,

where N is the total number of the samples. Here is the standard value of r for small, medium, and large sizes. The sign does not contain much information, so we often just report the absolute value of r.

small sizemedium sizelarge size
abs(r)0.10.30.5

R code example

Let's prepare the data. Create the data like the results from a 5-Likert scale question (the response is 1, 2, 3, 4, or 5), and you have two groups (Group) to compare.

GroupA = c(2,4,3,1,2,3,3,2,3,1) GroupB = c(3,5,4,2,4,3,5,5,3,2)

Then, do Mann-Whitney's U test.

wilcox.test(GroupA, GroupB)

And you get the result.

Wilcoxon rank sum test with continuity correction data: GroupA and GroupB W = 23, p-value = 0.03841 alternative hypothesis: true location shift is not equal to 0 Warning message; In wilcox.test.default(GroupA, GroupB) : cannot compute exact p-value with ties

However, as you can see here, the exact p value cannot be calculated because of ties. But this process is necessary to calculate the U value (which is reported as “W” in the results) because it is not straightforward to calculate the U value from the Z value (which is necessary to know for calculating the effect size), particularly when the sample size is small. Now I will show you how to calculate the Z value and exact p value.

library(coin)

Then, do another Mann-Whitney test. But you have to format the data for Mann-Whitney test with coin.

g = factor(c(rep("GroupA", length(GroupA)), rep("GroupB", length(GroupB)))) v = c(GroupA, GroupB) wilcox_test(v ~ g, distribution="exact")

Now you get another result.

Exact Wilcoxon Mann-Whitney Rank Sum Test data: v by g (GroupA, GroupB) Z = -2.1095, p-value = 0.03850 alternative hypothesis: true mu is not equal to 0

Thus, we have a significant effect of Group. You can also calculate the mean rank for each group as follows.

r = rank(v) data = data.frame(g, r) lapply((split(data, data$g)), mean) $GroupA g v NA 7.8 $GroupB g v NA 13.2

And calculate the effect size.

2.1095 / sqrt(20) 0.4716985

How to report

You can report the results of Mann-Whitney's U test as follows:

The medians of Group A and Group B were 2.5 and 3.5, respectively. We ran a Mann-Whitney's U test to evaluate the difference in the responses of our 5-Likert scale question. We found a significant effect of Group (The mean ranks of Group A and Group B were 7.8 and 13.2, respectively; U = 23, Z = -2.11, p < 0.05, r = 0.47).


References

For the effect size, please see: Field, A. Discovering statistics using SPSS. (2nd edition).