Mann-Whitney's U test


A Mann-Whitney's U test is also known as Wilcoxon Rank sum test, and basically a non-parametric version of t test. You want to use a Mann-Whitney's U test when

  • Your dependent variable is ordinal; or
  • Your dependent variable is either ratio or interval, but you cannot assume that your populations form the normal distribution.

So, you see people use a Mann-Whitney' U test when they have ordinal dependent variables or when they have only a small sample size (and thus they cannot assume the normality). However, Mann-Whitney' U test still assumes the equality of variances.

Although a Mann-Whitney's U test can be considered as a non-parametric version of t test, a Mann-Whitney's U test compares the medians of the two groups, not the means.

What does Mann-Whitney do?

Before looking at the example of a Mann-Whitney's U test, let's take a look at what a Mann-Whitney's U test does. The point of a Mann-Whitney's U test is that it treats the data as ordinal data. So you can order the data but the difference between any of the two values is not consistent. What a Mann-Whitney's U test does is to calculate the rank for each value instead of using the values as-is. Let's think about some data from a 5-Likert scale question and say you have the following data.

Group A13242
Group B35524

Then, you make a rank (R) based on these values. So,

Group A1 (R1)3 (R6)2 (R2)4 (R7)2 (R4)
Group B3 (R5)5 (R9)5 (R10)2 (R3)4 (R8)

For now, I just randomly ranked for the ties. But obviously this may cause a problem if we want to do a fair statistical test. One thing we can do is to take the average of the ranks of the ties and give them the same average. For instance, the value 2 gets rank 2 and 3 in this example. Instead of deciding which data point gets a higher rank, we just use the average of the ranks that value gets. So, both will get rank 2.5 in this case. Thus, with this correction, this example becomes

Group A1 (R1)3 (R5.5)2 (R3)4 (R7.5)2 (R3)
Group B3 (R5.5)5 (R9.5)5 (R9.5)2 (R3)4 (R7.5)

The means of the ranks of Group A and Group B are 4.0 and 7.0. The null hypothesis of a Mann-Whitney's U test is that the samples of the both groups came from the same population. So intuitively, if the null hypothesis holds, this means that there is no difference in the mean ranks between the two groups because both groups have the same chances to have low and high ranks. Thus, if the means of the ranks are skewed enough, you can say that you have a significant effect.

Please remember that this is not what exactly a Mann-Whitney test does. It calculates the statistics called the U value. The U value for each group is calculated by subtracting the possible minimum rank which the group can take from the sum of the ranks, and the smallest U value is used for the test. The distribution of the standardized U value is known to be close to the normal distribution when the sample size is more than 20. Thus, if the observed standardized U value is far from the center of the normal distribution (= 0), the test will reject the null hypothesis.

Effect size

The calculation of the effect size of Mann-Whitney's U test is fairly easy.


where N is the total number of the samples. Here is the standard value of r for small, medium, and large sizes. The sign does not contain much information, so we often just report the absolute value of r.

small sizemedium sizelarge size

R code example

Let's prepare the data. Create the data like the results from a 5-Likert scale question (the response is 1, 2, 3, 4, or 5), and you have two groups (Group) to compare.

GroupA = c(2,4,3,1,2,3,3,2,3,1) GroupB = c(3,5,4,2,4,3,5,5,3,2)

Then, do Mann-Whitney's U test.

wilcox.test(GroupA, GroupB)

And you get the result.

Wilcoxon rank sum test with continuity correction data: GroupA and GroupB W = 23, p-value = 0.03841 alternative hypothesis: true location shift is not equal to 0 Warning message; In wilcox.test.default(GroupA, GroupB) : cannot compute exact p-value with ties

However, as you can see here, the exact p value cannot be calculated because of ties. But this process is necessary to calculate the U value (which is reported as “W” in the results) because it is not straightforward to calculate the U value from the Z value (which is necessary to know for calculating the effect size), particularly when the sample size is small. Now I will show you how to calculate the Z value and exact p value.


Then, do another Mann-Whitney test. But you have to format the data for Mann-Whitney test with coin.

g = factor(c(rep("GroupA", length(GroupA)), rep("GroupB", length(GroupB)))) v = c(GroupA, GroupB) wilcox_test(v ~ g, distribution="exact")

Now you get another result.

Exact Wilcoxon Mann-Whitney Rank Sum Test data: v by g (GroupA, GroupB) Z = -2.1095, p-value = 0.03850 alternative hypothesis: true mu is not equal to 0

Thus, we have a significant effect of Group. You can also calculate the mean rank for each group as follows.

r = rank(v) data = data.frame(g, r) lapply((split(data, data$g)), mean) $GroupA g v NA 7.8 $GroupB g v NA 13.2

And calculate the effect size.

2.1095 / sqrt(20) 0.4716985

How to report

You can report the results of Mann-Whitney's U test as follows:

The medians of Group A and Group B were 2.5 and 3.5, respectively. We ran a Mann-Whitney's U test to evaluate the difference in the responses of our 5-Likert scale question. We found a significant effect of Group (The mean ranks of Group A and Group B were 7.8 and 13.2, respectively; U = 23, Z = -2.11, p < 0.05, r = 0.47).


For the effect size, please see: Field, A. Discovering statistics using SPSS. (2nd edition).


AtoMops, 2014/09/12 17:53
Thanks :D very nice explanation,

but there seems to be a typo in the 3rd table:

The ranks of the two 2-values are R2 and R3,

so the sum is 5 and the average rank is 2.5 not 3 (as in the table).
Albiner, 2014/09/20 04:15
No, AtoMops, there are actually three 2s giving a rank of R2, R3, R4 and averaging you get R3. So author is correct.
Guest, 2015/05/14 21:49
Thank you! We have been looking everywhere to find out how to generate the z value required to calculate the effect size.
Guest, 2015/07/09 15:50
Hi, I am on a mac and am having trouble downloading the coin package. Does it only work on Linux? If so, is there an alternative for macs?

Following comes up when I try:

> install.packages(coin)
Error in install.packages(coin) : object 'coin' not found
> install.packages("coin", repos="")
Warning: dependencies ‘modeltools’, ‘sandwich’ are not available
also installing the dependencies ‘’, ‘mvtnorm’, ‘multcomp’

Warning: unable to access index for repository
Packages which are only available in source form, and may need
compilation of C/C++/Fortran: ‘mvtnorm’ ‘coin’
Do you want to attempt to install these from sources?
y/n: y
installing the source packages ‘’, ‘mvtnorm’, ‘multcomp’, ‘coin’

trying URL ''
Content type 'application/x-gzip' length 4958405 bytes (4.7 MB)
downloaded 4.7 MB

trying URL ''
Content type 'application/x-gzip' length 333067 bytes (325 KB)
downloaded 325 KB

trying URL ''
Content type 'application/x-gzip' length 1036744 bytes (1012 KB)
downloaded 1012 KB

trying URL ''
Content type 'application/x-gzip' length 1596303 bytes (1.5 MB)
downloaded 1.5 MB

* installing *source* package ‘’ ...
** data
*** moving datasets to lazyload DB
** demo
** inst
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (
* installing *source* package ‘mvtnorm’ ...
** libs
clang -I/Library/Frameworks/R.framework/Resources/include -DNDEBUG -I/usr/local/include -I/usr/local/include/freetype2 -I/opt/X11/include -fPIC -Wall -mtune=core2 -g -O2 -c C_FORTRAN_interface.c -o C_FORTRAN_interface.o
clang -I/Library/Frameworks/R.framework/Resources/include -DNDEBUG -I/usr/local/include -I/usr/local/include/freetype2 -I/opt/X11/include -fPIC -Wall -mtune=core2 -g -O2 -c miwa.c -o miwa.o
gfortran-4.8 -fPIC -g -O2 -c mvt.f -o mvt.o
make: gfortran-4.8: No such file or directory
make: *** [mvt.o] Error 1
ERROR: compilation failed for package ‘mvtnorm’
* removing ‘/Library/Frameworks/R.framework/Versions/3.2/Resources/library/mvtnorm’
ERROR: dependencies ‘mvtnorm’, ‘sandwich’ are not available for package ‘multcomp’
* removing ‘/Library/Frameworks/R.framework/Versions/3.2/Resources/library/multcomp’
ERROR: dependencies ‘modeltools’, ‘mvtnorm’, ‘multcomp’ are not available for package ‘coin’
* removing ‘/Library/Frameworks/R.framework/Versions/3.2/Resources/library/coin’

The downloaded source packages are in
Warning messages:
1: In install.packages("coin", repos = "") :
installation of package ‘mvtnorm’ had non-zero exit status
2: In install.packages("coin", repos = "") :
installation of package ‘multcomp’ had non-zero exit status
3: In install.packages("coin", repos = "") :
installation of package ‘coin’ had non-zero exit status
Guest, 2015/07/28 05:31
If age groups/gender represents independent variable and responses have been collected for 5-6 items in the form of ranking (most preferred - rank 1 to least preferred rank - 5). Is this test an appropriate choice to check, if there is significant difference in the preferences of respondents from different age group/gender?

Curious, 2015/08/04 05:50
Can the vale of EFFECT SIZE (r) for Mann-Whiteny U test exceed 1?? Please explain and reply ASAP. Thanks Much!
Josh, 2015/08/13 22:36
The value of effect size is unitless, so it may be greater than 1.
Guest, 2015/11/12 15:10
"Although a Mann-Whitney's U test can be considered as a non-parametric version of t test, a Mann-Whitney's U test compares the medians of the two groups, not the means. "

This is wrong on two counts.
1. The test is not nonparametric. It estimates a parameter,which is the probability that an observation from one group will be higher than an observation from the other (read the title of Mann and Whitney's paper!)
2. It is not a test of equality of medians except with the unbelievable assumption that the two groups follow identical distributions. In fact, the test does not calculate or use the median at any point.
Guest, 2015/11/12 15:23
Actually, the Mann Whitney test has its own measure of effect, which is easy to interpret.

In your R code example, the probability of an observation from Group A being greater than an observation from Group B is 0·230. Or, inversely, the probability of an observation from Group B being higher is 0·77.

This is a very useful measure of effect size. Think about interpreting a clinical trial. The probability of a better outcome on treatment B compared with treatment A is 77%.

On the other hand, the peculiar measure of effect size that you present here has no real-life interpretation.
I propose that Mann and Whitney's original measure of effect size is far superior.

Here is the Stata output: It will look ugly because this isn't a monospaced font.

Two-sample Wilcoxon rank-sum (Mann-Whitney) test

var2 | obs rank sum expected
0 | 10 78 105
1 | 10 132 105
combined | 20 210 210

unadjusted variance 175.00
adjustment for ties -11.18
adjusted variance 163.82

Ho: var1(var2==0) = var1(var2==1)
z = -2.110
Prob > |z| = 0.0349

P{var1(var2==0) > var1(var2==1)} = 0.230

And the reference
Mann, H.B. & Whitney, D.R., 1947. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. Ann. Math. Statist., 18(1), pp.50–60.
Guest, 2015/12/07 10:06
There's a number of mistakes in the introduction. The Mann-Whitney U-test is not an alternative to a t-test. The Mann-Whiney U-test does not compare medians, only under certain circumstances. The Mann-Whitney Utest does not assume equal variances (it is based on ranks).
Valentina, 2016/02/18 15:49
Thank you very much!!!
Guest, 2016/09/14 09:11
Hifaaa, 2016/09/14 09:12
Guest, 2016/09/28 16:53
where can I find the z score in SPSS? I cannot calculate effect size without it!
hcistats/mannwhitney.txt · Last modified: 2014/03/29 00:21 by Koji Yatani

