A crosstab table is probably the most common way to visualize the nominal (categorical) data. It is a table representing the distributions of the responses to two variables. A crosstab table can be 2 x 2 or n x m as follows.
Variable 2 | |||
---|---|---|---|
B1 | B2 | ||
Variable 1 | A1 | 16 | 29 |
A2 | 32 | 11 |
Variable 2 | |||||
---|---|---|---|---|---|
B1 | B2 | B3 | B4 | ||
Variable 1 | A1 | 5 | 8 | 10 | 7 |
A2 | 3 | 11 | 9 | 5 | |
A3 | 2 | 8 | 6 | 1 |
As I explain in the types of data page, you cannot do many things on categorical data. However, a crosstab table helps you explore your categorical data a lot, and there are several statistics you can do with a crosstab table. A Chi-square test and other similar tests is one of them. In this page, I explain other statistics you can do with a crosstab table.
A coefficient of association is something like a correlation for categorical data. In other words, it represents how the distribution of the data are changing depending on one variable. Let's take a look at an example with a crosstab table.
Device ownership | |||
---|---|---|---|
Device A | Device B | ||
Age | Young | 20 | 10 |
Old | 3 | 27 |
This crosstab table shows the distribution of the ownership of the two devices separated by users' ages. From this table, it looks like the age affects the ownership of the device (i.e., younger users tend to like owing Device A and older users tend to like owing Device B). Thus, it seems that the age and ownership are correlated. Now what we want to know is how much they are correlated. Unfortunately, we cannot do correlation because the data are not ordinal, interval, and ratio. But, there are three metrics instead of correlation.
Although phi-coefficients and contingency coefficients are valid metrics, the problem of those metrics is that you cannot use them to compare the strength of association across crosstab tables with different sizes. Thus, Cramer's V is generally the first choice to see the association. You can calculate these values very easily in R, but you need to include vcd package.
Thus, Cramer's V is 0.58 in this example. You can calculate Cramer's V for n x m crosstab tables in the same way.
Agreement is another metric you can derive from a crosstab table. This metric is used when two people look at the same data and categorize them. For instance, you have a bunch of quotes you gained from the interviews with your participants, and categorized them with you and another researcher. You have several themes (categories or groups), and for one category, your categorization is as follows (“yes” means that the rater thinks the quote belongs to that category, and “no” means the rater doesn't think so).
Rater 2 | |||
---|---|---|---|
Yes | No | ||
Rater 1 | Yes | 35 | 5 |
No | 4 | 110 |
What you want to show is how well both of you agreed with the categorization for this category. If you don't agree much, this categorization doesn't really have a power or is ambiguous. One metric for this is an agreement percentage, which is the ratio of the numbers of the instances both raters agreed with (i.e., both said “yes” or both said “no”) over the total number. You can easily calculate this manually.
Thus, you have 94% agreement. This seems fine, but the problem is that you do not remove the effects caused by randomness. You just have a good result by chance. To claim the reproducibility, we want to have a metric which gets rid of the effects caused by randomness, and a Cohen's Kappa is such a metric. It ranges from -1 to 1. You can easily calculate the Cohen's Kappa in R.
Thus, the Cohen's Kappa for this categorization is 0.85 (look at the value for unweighted). The Cohen's Kappa is usually smaller than the agreement percentage. ASE means Approximate Standard Error, and you can calculate an approximate 95% confidence interval by 1.96 * ASE. So in this case, the 95% confidence interval is [0.8467831 - 1.96 * 0.04955746, 0.8467831 +1.96 * 0.04955746] = [0.75, 0.94].
You can see the magnitude of the agreement by the Kappa coefficient.
Kappa value | magnitude of agreement |
---|---|
< 0 | no |
0 - 0.2 | small |
0.2 - 0.4 | fair |
0.4 - 0.6 | moderate |
0.6 - 0.8 | substantial |
0.8 - 1 | almost perfect |
In a practical situation, a Kappa coefficient over 0.6 suggests that your categorization is robust. If not, it suggests that there are categories which are ambiguous or are not well-agreed by the raters. So, you may have to re-think about your categorization.
If you do not have a crosstab table for the data, and instead you have raw data (like 0 or 1 in one column for one rater, and another column for the other rater), it is probably easier to use the psy package. Here is a quick example of how to use the psy package.
You can then calculate the confidence interval as well, but it is a bit more complicated process because we are using a bootstrap method. We will use the boot package.
So the Kappa is 0.40 and its CI is [-0.06, 0.78]. I won't go into the details of this code, but it should work by a simple copy-and-paste. Please note that the CI estimated by the bootstrap method is usually different from the CI calculated by ASE. The bootstrap method is known to be more accurate, so I would recommend to use it if possible.
Finally, you can report your Cohen's Kappa with its confidence interval as follows: The measured Cohen's Kappa for our results was 0.85 (95% CI: [0.75, 0.94]), indicating a strong agreement.
The above Cohen's Kappa is for nominal (or categorical) data, which means that your dependent values do not have any specific order. But you may often have ordered values (such as subjective ratings) and want to know how much two raters agree on their ratings. Let's think about a hypothetical case in which two raters are going to rate the quality of pictures taken by someone with three rating: Good, OK, Bad. Then, you get the following results.
Rater 2 | ||||
---|---|---|---|---|
Good | OK | Bad | ||
Rater 1 | Good | 40 | 3 | 2 |
OK | 5 | 31 | 9 | |
Bad | 2 | 7 | 21 |
This is almost the same as a 2×2 crosstab table, so we can just use the Cohen's Kappa for testing the agreement (which is legitimate). The limitation of this naive Cohen's Kappa is that it treats all disagreements equally. But, in this example, it is more natural to think that the disagreement between “Good” and “Bad” has more weight than the disagreement between “Good” and “OK”. To account this, we can calculate the weighted Cohen's Kappa.
There are a number of ways to calculate such weight, but the most common one is the squared weight: weights are related to squared differences between rows and columns indices in the crosstab table. We are going to use the psy package. And in this example code, I use 0, 1, 2 for representing the ratings (so 0 for “bad”, etc)
And we are going to estimate the confidence interval with the boot package.
Thus, the weighted Cohen's Kappa is 0.64, and its 95% confidence interval is [0.15, 0.88]. You can report your weighted Cohen's Kappa with its confidence interval as follows: The measured weighted Cohen's Kappa (squared weights) for the ratings by the two raters was 0.64 (95% CI: [0.15, 0.88]), indicating a moderate agreement.
For the Cohen's Kappa,