Cochran's Q test

Introduction

Simply speaking, Cochran's Q test is a binomial data version of repeated-measure ANOVA or Friedman test. So, you have multiple binomial data (like “yes” or “no” responses), and you want to see whether the ratio of the responses are different across the groups (e.g., methods, software or devices the participant use). Let's say you have data as follows.

	Are you using this software? (o=no, 1=yes)
	Software A	Software B	Software C
User 1	1	0	1
User 2	0	0	1
User 3	0	1	0
User 4	0	1	1
User 5	0	0	1
User 6	1	1	1
User 7	0	0	1
User 8	0	1	1
User 9	0	1	1
User 10	0	1	1

Now, you want to compare the responses across the kinds of software with Cochran's Q test.

Effect Size

I haven't figured out how to directly calculate the effect size of Cochran's Q test. So, here, similarly to Kruskal-Wallis test, I calculate the effect size for a post-hoc McNemar's test.

R code example

The code for Cochran's Q test is similar to those for ANOVA. First, you need to prepare the data.

Answer <- c(1,0,1,0,0,1,0,1,0,0,1,1,0,0,1,1,1,1,0,0,1,0,1,1,0,1,1,0,1,1) Participant <- factor(c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7,7,8,8,8,9,9,9,10,10,10)) Software <- factor(rep(1:3, 10)) data <- data.frame(Participant, Software, Answer)

Then, include coin package, which is necessary for Cochran's Q test. After that, you just need to specify which the independent or dependent variable is with a similar syntax for ANOVA.

library(coin) symmetry_test(Answer ~ factor(Software) | factor(Participant), data = data, teststat = "quad")

Now, you get the result.

Asymptotic General Independence Test data: Answer by factor(Software) (1, 2, 3) stratified by factor(Participant) chi-squared = 8.2222, df = 2, p-value = 0.01639

Thus, you have a significant effect of Software on the responses.

Post-hoc test with McNemar's test

When you find any significant effect, you need to do a post-hoc test as you do for ANOVA or Friedman test. For Cochran's Q test, you can basically run multiple McNemar McNemar's tests and adjust the p values with Bonferroni or Holm correction. Unfortunately, this requires you to do a little bit of work. First you need to make a 2×2 table for each combination of Software.

(Later, I found that you don't need to make a 2×2 table to do McNemar's test. For more details, please see below.)

		Software 2
		yes	no
Software 1	yes	1	1
Software 1	no	5	3

		Software 3
		yes	no
Software 1	yes	2	0
Software 1	no	7	1

		Software 3
		yes	no
Software 2	yes	5	1
Software 2	no	4	0

And run a McNemar's test for each 2×2 table.

data1 <- matrix(c(1, 1, 5, 3), ncol=2, byrow=T) mcnemar.test(data1) McNemar's Chi-squared test with continuity correction data: data1 McNemar's chi-squared = 1.5, df = 1, p-value = 0.2207 data2 <- matrix(c(2, 0, 7, 1), ncol=2, byrow=T) mcnemar.test(data2) McNemar's Chi-squared test with continuity correction data: data2 McNemar's chi-squared = 5.1429, df = 1, p-value = 0.0233 data3 <- matrix(c(5, 1, 4, 0), ncol=2, byrow=T) mcnemar.test(data3) McNemar's Chi-squared test with continuity correction data: data3 McNemar's chi-squared = 0.8, df = 1, p-value = 0.3711

Then, adjust the p value. Here we use the Bonferroni correction.

p <- c(0.2207, 0.0233, 0.3711) p.adjust(p, method="bonferroni") 0.6621 0.0699 1.0000

Thus, there is a significant difference between Software1 and Software3. McNemar's test in SPSS uses the binomial distribution, which often makes the results different from those you can get in R. For more details, see the code example for McNemar's test. Some versions of SPSS applies Cochran's Q test to each pair combination of the groups. This is also legitimate because Cochran's Q test becomes equivalent to Mcnemar's test without continuity correction when it applies to the data for two groups.

Finally, calculate the effect size for the difference between Software1 and Software3. Again, I am not 100% sure that this is the best way to calculate the effect size for Cochran's Q test, but I just followed the way we do for Kruskal-Wallis test.

sqrt(5.1429 / 20) 0.5070947

Please note that we need to use the total sample size (10 * 2 = 20) to calculate the effect size $\phi$ . Another way to do McNemar's test is to just take the data from the dataframe.

mcnemar.test(data[data$Software==1,]$Answer, data[data$Software==2,]$Answer) McNemar's Chi-squared test with continuity correction data: data[data$Software == 1, ]$Answer and data[data$Software == 2, ]$Answer McNemar's chi-squared = 1.5, df = 1, p-value = 0.2207

Make sure that an element in the same index in the two vectors for the test represents the data taken from the same participants (or the data considering the within-subject factor).

You may have some cases where you see a significant difference with Cochran's Q test, but don't see any difference with the post-hoc comparison. It is probably because Cochran's Q test may be too strong or a post-hoc comparison may be too strict (or the sample size may be too small). But please note that this can happen fairly often.

How to report

You can report the results of Cochran's Q test as follows: With a Cochran's Q test, we found that there exists a significant difference in usage among the three kinds of software we surveyed ( $\chi^{2}$ (2) = 8.22, p < 0.05). A pairwise comparison using continuity-corrected McNemar's tests with Bonferroni correction revealed that significantly more participants used Software3 than Software1 (p<0.1, $\phi$ = 0.51).