# Statistical Methods for HCI Research

### Site Tools

hcistats:chisquare

# Chi-square, Fisher's exact, and McNemar's test

## Chi-square test

A Chi-square test is a common test for nominal (categorical) data. One application of a Chi-square test is a test for independence. In this case, the null hypothesis is that the occurrence of the outcomes for the two groups is equal. For example, you have two user groups (e.g., male and female, or young and elderly). And you have nominal data for each group, for example, whether they use mobile devices or which OS they use. So, your data look like this. If your data of the two groups came from the same participants (i.e., the data were paired), you should use the McNemar's test.

Own device ADon't own device A
Male255
Female1515
WindowsMacLinux
Young16113
Old2181

And now you are interested in figuring out whether the outcomes for the two groups were statistically equal. The assumption of Chi-square is that the samples are taken independently or are unpaired. If not, you need to use McNemar's test. And if you have only a small sample size, you should use the Fisher's exact test.

### Effect size

The effect size of a Chi-square test can be described by phi or Cramer's V. If your data table is 2 x 2, you will calculate phi (k=2 in the equation below) and otherwise, Cramer's V (k>2 in the equation below) . But the calculation is pretty much the same and it is as follows:

$\Large \phi \ or \ V= \sqrt{\frac{\chi^{2}}{N(k-1)}}$,

where N is the total number of the samples, and k is the number of the rows or columns, whichever smaller, in your data table. And the chi-squared here is the value without any correction. Here are values which are considered small, medium and large sizes.

small sizemedium sizelarge size
Cramer's phi or V0.100.300.50

### R code example

Let's use the examples above. First, prepare the data.

data <- matrix(c(25, 5, 15, 15), ncol=2, byrow=T) data2 <- matrix(c(16, 11, 3, 21, 8, 1), ncol=2, byrow=T)

And run a Chi-squared test.

chisq.test(data) Pearson's Chi-squared test with Yates' continuity correction data: data X-squared = 6.075, df = 1, p-value = 0.01371 chisq.test(data2) Pearson's Chi-squared test data: data2 X-squared = 2.1494, df = 2, p-value = 0.3414

So, the first example has a significant difference, which means the ownership of device A significantly differs between male and female users. The effect size of the first test can be calculated with vcd package:

library(vcd) assocstats(data) X^2 df P(> X^2) Likelihood Ratio 7.7592 1 0.0053440 Pearson 7.5000 1 0.0061699 Phi-Coefficient : 0.354 Contingency Coeff.: 0.333 Cramer's V : 0.354

For a 2×2 table, you can also calculate the odds ratio. The odds ratio is how the probability of the phenomena is affected by the dependent variable. This can be calculated as ad / bc.

Own device ADon't own device A
Malea = 25b =5
Femalec = 15d = 15
(25 * 15) / (5 * 15) 5

### How to report

You can report the results of a Chi-square test like this:

Our Chi-square test with Yates' continuity correction revealed that the percentage of the ownership of device A significantly differed by gender ($\small \chi^{2}$(1, N = 60) = 6.08, p < 0.01, $\small \phi$ = 0.35, the odds ratio is 5.0).

## Fisher's exact test

You can instead use Fisher's exact test if your sample size is small. It is hard to say how many samples are small, but in general, it is better to use a Fisher's exact test than a Chi-square test when you have small than 10 in any cell of your data table (like the examples above).

### R code example

Running a Fisher's exact test is pretty similar to Chi-square.

fisher.test(data) Fisher's Exact Test for Count Data data: data p-value = 0.0127 alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 1.335859 20.757326 sample estimates: odds ratio 4.859427

### How to report

How to report the results of a Fisher's exact test is pretty much the same as the Chi-square test. Unlike Chi-square test, you don't have any statistics like chi-squared. So, you just need to report the p value. Some people include the odd ratio with the confidence intervals.

## McNemar's test

McNemar's test is basically a paired version of Chi-square test. Let's say you asked whether the participants liked the device before and after the experiment.

After experiment
Yes No
Before experimentYes62
No84

Here, what you want to test is whether the number of the participants who liked the device were significantly changed between before and after the experiment.

### Effect size

The effect size of the Fisher's exact test can be calculated in the same way as the one for the Chi-square test.

### R code example

Running a McNemar's exact test is pretty similar to Chi-square.

data <- matrix(c(6, 2, 8, 4), ncol=2, byrow=T) mcnemar.test(data) McNemar's Chi-squared test with continuity correction data: data McNemar's chi-squared = 2.5, df = 1, p-value = 0.1138

Thus, we cannot reject the null hypothesis, and it means that the number of the participants who liked the device were not significantly changed between before and after the experiment. As you can see here, mcnemar.test() automatically makes correction for continuity. You can disable it with correct=F option, and the results will become the same with the function for Cochran Cochran's Q test.

### McNemar's test and binomial test

In SPSS, the binomial distribution is used for McNemar's test. Thus, the results look different from those you can get in R. A binomial test is very similar to McNemar's test, but its null hypothesis is that the ratio of the two categories is equal to an expected distribution. In most cases, a binomial test is used for testing whether two categories are equally likely to occur.

Question 2 (post-treatment)
Yes No
Question 1 Yes a b
(pre-treatment) No c d

More precisely, you need to use a binomial test rather than McNemar's test if b+ c in the 2×2 table is small. However, in R, you can run McNemar's test with continuity correction, so it will cause a big problem because the results of a binmoal test and McNemar's test with continuity correction become similar.

If you want to do a binomial test like SPSS does, you need to use binomial function. And you need two numbers, which is the total count for the cases where the participants flipped the responses (i.e., b+ c. In the example we are using, 2 + 8 = 10), and the number of one of these two cases (i.e., 2 or 8).

After experiment
Yes No
Before experiment Yes 6 2
No 8 4
binom.test(2, 10, 0.5) Exact binomial test data: 2 and 10 number of successes = 2, number of trials = 10, p-value = 0.1094 alternative hypothesis: true probability of success is not equal to 0.5 95 percent confidence interval: 0.02521073 0.55609546 sample estimates: probability of success 0.2

In this case, the p value is pretty close regardless of the ways to do a McNemar's test.

### How to report

How to report the results of a McNemar's test is pretty much the same as the Chi-square test.