Statistics with Crosstab Tables

Introduction

A crosstab table is probably the most common way to visualize the nominal (categorical) data. It is a table representing the distributions of the responses to two variables. A crosstab table can be 2 x 2 or n x m as follows.

		Variable 2
		B1	B2
Variable 1	A1	16	29
Variable 1	A2	32	11

		Variable 2
		B1	B2	B3	B4
Variable 1	A1	5	8	10	7
	A2	3	11	9	5
	A3	2	8	6	1

As I explain in the types of data page, you cannot do many things on categorical data. However, a crosstab table helps you explore your categorical data a lot, and there are several statistics you can do with a crosstab table. A Chi-square test and other similar tests is one of them. In this page, I explain other statistics you can do with a crosstab table.

Coefficients of Association (Phi-Coefficients, Contingency Coefficients and Cramer's V)

A coefficient of association is something like a correlation for categorical data. In other words, it represents how the distribution of the data are changing depending on one variable. Let's take a look at an example with a crosstab table.

		Device ownership
		Device A	Device B
Age	Young	20	10
Age	Old	3	27

This crosstab table shows the distribution of the ownership of the two devices separated by users' ages. From this table, it looks like the age affects the ownership of the device (i.e., younger users tend to like owing Device A and older users tend to like owing Device B). Thus, it seems that the age and ownership are correlated. Now what we want to know is how much they are correlated. Unfortunately, we cannot do correlation because the data are not ordinal, interval, and ratio. But, there are three metrics instead of correlation.

Phi-Coefficient: If your crosstab table is 2 x 2, it becomes equal to the absolute value of Pearson's product-moment correlation. However, this coefficient depends on the size of the crosstab table (defined as the min(n, m) of a n x m table), and ranges between 0 and the square root of (min(n, m) - 1).
Contingency Coefficient: This is also dependent on the size of the crosstab table, and ranges between 0 and the square root of (1 - 1 / min(n, m)).
Cramer's V: This is independent of the size of the crosstab table, and ranges between 0 and 1.

Although phi-coefficients and contingency coefficients are valid metrics, the problem of those metrics is that you cannot use them to compare the strength of association across crosstab tables with different sizes. Thus, Cramer's V is generally the first choice to see the association. You can calculate these values very easily in R, but you need to include vcd package.

data <- matrix(c(20, 10, 3, 27), ncol=2, byrow=T) library(vcd) assocstats(data) X^2 df P(> X^2) Likelihood Ratio 22.185 1 2.4762e-06 Pearson 20.376 1 6.3622e-06 Phi-Coefficient : 0.583 Contingency Coeff.: 0.503 Cramer's V : 0.583

Thus, Cramer's V is 0.58 in this example. You can calculate Cramer's V for n x m crosstab tables in the same way.

Agreement and inter-rater reliability (Cohen's Kappa)

Cohen's Kappa for Nominal Data

Agreement is another metric you can derive from a crosstab table. This metric is used when two people look at the same data and categorize them. For instance, you have a bunch of quotes you gained from the interviews with your participants, and categorized them with you and another researcher. You have several themes (categories or groups), and for one category, your categorization is as follows (“yes” means that the rater thinks the quote belongs to that category, and “no” means the rater doesn't think so).

		Rater 2
		Yes	No
Rater 1	Yes	35	5
Rater 1	No	4	110

What you want to show is how well both of you agreed with the categorization for this category. If you don't agree much, this categorization doesn't really have a power or is ambiguous. One metric for this is an agreement percentage, which is the ratio of the numbers of the instances both raters agreed with (i.e., both said “yes” or both said “no”) over the total number. You can easily calculate this manually.

(35 + 110) / (35 + 5 + 4 + 110) 0.9415584

Thus, you have 94% agreement. This seems fine, but the problem is that you do not remove the effects caused by randomness. You just have a good result by chance. To claim the reproducibility, we want to have a metric which gets rid of the effects caused by randomness, and a Cohen's Kappa is such a metric. It ranges from -1 to 1. You can easily calculate the Cohen's Kappa in R.

data <- matrix(c(35, 5, 4, 110), ncol=2, byrow=T) library(vcd) Kappa(data) value ASE Unweighted 0.8467831 0.04955746 Weighted 0.8467831 0.08027348

Thus, the Cohen's Kappa for this categorization is 0.85 (look at the value for unweighted). The Cohen's Kappa is usually smaller than the agreement percentage. ASE means Approximate Standard Error, and you can calculate an approximate 95% confidence interval by 1.96 * ASE. So in this case, the 95% confidence interval is [0.8467831 - 1.96 * 0.04955746, 0.8467831 +1.96 * 0.04955746] = [0.75, 0.94].

You can see the magnitude of the agreement by the Kappa coefficient.

Kappa value	magnitude of agreement
< 0	no
0 - 0.2	small
0.2 - 0.4	fair
0.4 - 0.6	moderate
0.6 - 0.8	substantial
0.8 - 1	almost perfect

In a practical situation, a Kappa coefficient over 0.6 suggests that your categorization is robust. If not, it suggests that there are categories which are ambiguous or are not well-agreed by the raters. So, you may have to re-think about your categorization.

If you do not have a crosstab table for the data, and instead you have raw data (like 0 or 1 in one column for one rater, and another column for the other rater), it is probably easier to use the psy package. Here is a quick example of how to use the psy package.

library(psy) testdata <- rbind(c(1,1,1,1,1,0,0,0,0,0,1,1,1,1,1,0,0,0,0,0),c(1,1,1,1,0,1,0,0,0,0,1,1,1,0,0,1,1,0,0,0)) testdata <- t(testdata) ckappa(testdata) $table 0 1 0 7 3 1 3 7 $kappa [1] 0.4

You can then calculate the confidence interval as well, but it is a bit more complicated process because we are using a bootstrap method. We will use the boot package.

library(boot) ckappa.boot <- function(data,x) {ckappa(data[x,])[[2]]} res <- boot(testdata, ckappa.boot, 1000) boot.ci(res,type="bca") BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS Based on 1000 bootstrap replicates CALL : boot.ci(boot.out = res, type = "bca") Intervals : Level BCa 95% (-0.0602, 0.7802 ) Calculations and Intervals on Original Scale

So the Kappa is 0.40 and its CI is [-0.06, 0.78]. I won't go into the details of this code, but it should work by a simple copy-and-paste. Please note that the CI estimated by the bootstrap method is usually different from the CI calculated by ASE. The bootstrap method is known to be more accurate, so I would recommend to use it if possible.

Finally, you can report your Cohen's Kappa with its confidence interval as follows: The measured Cohen's Kappa for our results was 0.85 (95% CI: [0.75, 0.94]), indicating a strong agreement.

Weighted Cohen's Kappa for Ordinal Data

The above Cohen's Kappa is for nominal (or categorical) data, which means that your dependent values do not have any specific order. But you may often have ordered values (such as subjective ratings) and want to know how much two raters agree on their ratings. Let's think about a hypothetical case in which two raters are going to rate the quality of pictures taken by someone with three rating: Good, OK, Bad. Then, you get the following results.

		Rater 2
		Good	OK	Bad
Rater 1	Good	40	3	2
	OK	5	31	9
	Bad	2	7	21

This is almost the same as a 2×2 crosstab table, so we can just use the Cohen's Kappa for testing the agreement (which is legitimate). The limitation of this naive Cohen's Kappa is that it treats all disagreements equally. But, in this example, it is more natural to think that the disagreement between “Good” and “Bad” has more weight than the disagreement between “Good” and “OK”. To account this, we can calculate the weighted Cohen's Kappa.

There are a number of ways to calculate such weight, but the most common one is the squared weight: weights are related to squared differences between rows and columns indices in the crosstab table. We are going to use the psy package. And in this example code, I use 0, 1, 2 for representing the ratings (so 0 for “bad”, etc)

library(psy) testdata <- rbind(c(0,0,0,0,0,1,1,1,1,1,2,2,2,2,2,1,1,1,0,0),c(0,0,0,0,2,1,1,1,1,0,2,2,2,2,1,2,2,1,1,0)) testdata <- t(testdata) wkappa(testdata) $table 0 1 2 0 5 1 1 1 1 5 2 2 0 1 4 $weights [1] "squared" $kappa [1] 0.6428571

And we are going to estimate the confidence interval with the boot package.

library(boot) wkappa.boot <- function(data,x) {wkappa(data[x,])[[3]]} res <- boot(testdata, wkappa.boot, 1000) boot.ci(res,type="bca") BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS Based on 1000 bootstrap replicates CALL : boot.ci(boot.out = res, type = "bca") Intervals : Level BCa 95% ( 0.1495, 0.8775 ) Calculations and Intervals on Original Scale Some BCa intervals may be unstable

Thus, the weighted Cohen's Kappa is 0.64, and its 95% confidence interval is [0.15, 0.88]. You can report your weighted Cohen's Kappa with its confidence interval as follows: The measured weighted Cohen's Kappa (squared weights) for the ratings by the two raters was 0.64 (95% CI: [0.15, 0.88]), indicating a moderate agreement.

References

For the Cohen's Kappa,

Cohen, Jacob (1960). “A coefficient of agreement for nominal scales”. Educational and Psychological Measurement 20 (1): 37–46.
Cohen, Jacob (1968). “Weighed kappa: nominal scale agreement with provision for scaled disagreement or partial credit”. Psychological Bulletin 70 (4): 213–220.

Table of Contents