hcistats:anova

Analysis of Variance (ANOVA) for comparing the means

Introduction

Analysis of Variance (ANOVA) is one statistical test commonly used in HCI research. Although it says “analysis of variance,” and ANOVA is a method to compare multiple linear models, a very common way to use an ANOVA test is to test the difference in means among more than two groups. So, the intuition is a t test which accommodates more than two groups to compare. For example, when you want to compare user performance of three interaction techniques, ANOVA is one of the methods you want to consider.

But, unfortunately, ANOVA is not as simple as this intuition, and understanding ANOVA is fairly hard (well, at least I think so. I don't think I fully understand it yet.). Particularly, doing a two-way repeated-measure ANOVA test is not easy in R, and you may want to use other statistics software, such as SPSS. It is not like it is impossible, but it is very complicated, and you may have to have some knowledge about linear models (and manipulations for these in R).

As you can see in the table of content, there are many things you need to learn in ANOVA. Nonetheless, you can do an ANOVA test with your statistical software without necessarily knowing all its details, and this is probably why you can see so many different ways to do ANOVA in HCI papers. In this page, I do my best to explain my understandings of ANOVA, and provide the procedures of ANOVA recommended by other literature. But if you have time, I recommend you to read a statistics book to understand ANOVA (and to double-check what I am saying here) as well as this page.

Why not a t test?

One simple question you may have is “why cannot we use multiple t tests instead of ANOVA?” This is a very understandable question, and you could do so with some corrections (and this is what we often do in a post-hoc comparison). However, if you have more than two groups to compare, using ANOVA is a de-facto standard.

There is one clear reason why we cannot use multiple t test instead of ANOVA for saying whether a factor which has more than two groups (e.g., three different techniques) has a significant effect on the dependent variable (e.g., performance time). Let's say, you have Technique A, Technique B, and Technique C, and you have measured performance time of these three techniques. What you want to know here is whether these techniques have a significant effect on performance time, that is whether there exists at least one significant difference in performance time against the techniques. In other words, your null hypothesis is that there is no significant difference in the means among three techniques.

If you try to do multiple t tests, you have to do three tests: One test between Technique A and Technique B, one between B and C, and one between A and C. If you test the t test null hypothesis (i.e., there is no difference in the means between between Technique A and Technique B) with 95% confidence (i.e., p=0.05) and you cannot reject it, your confidence of that rejection is 95%. Remember that your null hypothesis you really need to test is there is no significant difference in the means among three techniques. You need to do two more tests for B and C, and A and C. If all three tests reject the null hypothesis, you can say that there is no significant difference in the means among three techniques. But this also means that your confidence level becomes 95% * 95% * 95% = 85.7% because you do the AND operation on your tests. Obviously, a test with 85.7% confidence is not useful for us. Thus, you cannot simply do multiple t tests if you have more than two groups to compare.

By the way, what if we compare two groups with ANOVA? The results will be the same as what you can get with an unpaired/paired t test. So it is ok to use ANOVA for two groups, but using t tests is more common.

Repeated Measure

Repeated measure is a similar concept of paired in t test. If you gain the values for the multiple conditions from the same subject (i.e., you make measurements repeated from the same participants), repeated measure ANOVA comes in to play. One common case in which you need to use repeated measure ANOVA is that you did a within-participant design experiment. (i.e., all of your participants experienced all the conditions). So, think about your experimental design well and choose the right ANOVA for your experiment.

Assumptions

Similarly to other parametric tests, ANOVA makes several assumptions.

• Normality: The distributions of the populations you take samples from are normal.
• Homogeneity of variances / Sphericity: The basic idea of this requirement is that the variance in each group should be similar enough (although there is an additional thing for the sphericity, I didn't discuss it in this wiki page). This is referred as homogeneity of variances for between-subject ANOVA, and sphericity for within-subject ANOVA.
• Data type: The dependent variable must be interval or ratio (e.g., time, and error rates).

For the normality assumption, you can see another page. If you think you can assume the normality, you need to make sure that the second assumption holds. We are going to look into how you can test the homogeneity of variances or sphericity.

Another thing you need to be careful about is whether the data are balanced or unbalanced. Balanced means that the sample sizes are equal across all the groups to compare. Although ANOVA can accommodate mildly unbalanced data, you have to have extra cautions particularly when you do two-way ANOVA.

Homogeneity of variances

There are a couple of ways to check the homogeneity of variances. Here, I show three kinds of tests with R codes. You can find the R code for making the data used in these tests in the section of the R code for one-way ANOVA.

Bartlett's test

bartlett.test(Value ~ Group,data) Bartlett test of homogeneity of variances data: Value by Group Bartlett's K-squared = 1.0264, df = 2, p-value = 0.5986

Fligner's test

fligner.test(Value ~ Group,data) Fligner-Killeen test of homogeneity of variances data: Value by Group Fligner-Killeen:med chi-squared = 0.9972, df = 2, p-value = 0.6074

Levene's test

In SPSS, Levene's test is used by default for the homogeneity of variance test. You can do it in R too, but need to include car package.

library(car) levene.test(Value ~ Group) Levene's Test for Homogeneity of Variance Df F value Pr(>F) group 2 0.42 0.6624 21

So, in this example, p is over 0.05, so we can assume that the variances are equal. But what should we do if p < 0.05? One solution is to do a data transformation. There are several ways to do so, and which transformation to use depends on the relationship between the means and variances. Some details are available in a separate page.

Another thing you can do is to switch to non-parametric methods, such as Kruskal-Wallis. This is a better way to take if you are not sure what kind of data transformation you should perform.

Sphericity

A rough idea of sphericity is the homogeneity of variances for repeated-measure ANOVA. A standard test for Sphericity is Mauchly's test. Unfortunately, doing a test for checking the sphericity is not easy in R (This is automatically done in SPSS including the corrections for the cases where the sphericity assumption is violated). Here is the way to do repeated-measure ANOVA with the Sphericity test. There are two ways to do a sphericity test.

Sphericity test with the car package

The following procedure using car package is based on the blog entry here.

First, prepare the data to analyze.

Group <- c("A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","C","C","C","C","C","C","C","C") Value <- c(1,2,4,1,1,2,2,3,3,4,4,2,3,4,4,3,4,5,3,5,5,3,4,6) Participant <- c("1","2","3","4","5","6","7","8","1","2","3","4","5","6","7","8","1","2","3","4","5","6","7","8") data <- data.frame(Participant, Group, Value)

You need to make a matrix such that the rows are the within-subject factor (Participant) and the columns are the groups to compare (Group).

matrix <- with(data, cbind(Value[Group=="A"], Value[Group=="B"], Value[Group=="C"]))

Then, build a multivariate linear model with the matrix you've just created.

model <- lm(matrix ~ 1)

Now, you need to define the design of the study, which basically means that you need to make a list of the independent variable.

design <- factor(c("A", "B", "C"))

Good. You are almost there. :) You need to include car package.

library(car)

The car package has Anova() function, which includes Mauchly's test.

options(contrasts=c("contr.sum", "contr.poly")) aov <- Anova(model, idata=data.frame(design), idesign=~design, type="III") summary(aov, multivariate=F)

Finally, you get the results!

Univariate Type III Repeated-Measures ANOVA Assuming Sphericity SS num Df Error SS den Df F Pr(>F) (Intercept) 253.50 1 5.1667 7 343.45 3.304e-07 *** design 22.75 2 14.5833 14 10.92 0.001388 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Mauchly Tests for Sphericity Test statistic p-value design 0.63791 0.25958 Greenhouse-Geisser and Huynh-Feldt Corrections for Departure from Sphericity GG eps Pr(>F[GG]) design 0.73417 0.00452 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 HF eps Pr(>F[HF]) design 0.88099 0.002349 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Sphericity test with the ez package

Another way to do a sphericity test is to use ez package. We use the same data in this example too.

Group <- c("A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","C","C","C","C","C","C","C","C") Value <- c(1,2,4,1,1,2,2,3,3,4,4,2,3,4,4,3,4,5,3,5,5,3,4,6) Participant <- c("1","2","3","4","5","6","7","8","1","2","3","4","5","6","7","8","1","2","3","4","5","6","7","8") data <- data.frame(Participant, Group, Value)

Now, include the ez package.

library(ez) options(contrasts=c("contr.sum", "contr.poly")) ezANOVA(data=data, dv=.(Value), wid=.(Participant), within=.(Group), type=3)

You have to specify several things in ezANOVA() function. In this example, we specify the dependent variable (dv), the indices indicating different cases (wid), and the within-subject factor (within). I choose Type III SS by saying “type=3.” For more details of how to use ezANOVA() function, see the repeated-measure two-way ANOVA example in this page or the manual (execute ?ezANOVA in the prompt). After executing the command, you get the results.

Note: model has only an intercept; equivalent type-III tests substituted. $ANOVA Effect DFn DFd SSn SSd F p p<.05 ges 1 (Intercept) 1 7 253.50 5.166667 343.4516 3.303536e-07 * 0.9277219 2 Group 2 14 22.75 14.583333 10.9200 1.387779e-03 * 0.5352941$Mauchly's Test for Sphericity Effect W p p<.05 2 Group 0.6379102 0.2595844 $Sphericity Corrections Effect GGe p[GG] p[GG]<.05 HFe p[HF] p[HF]<.05 2 Group 0.734166 0.004520309 * 0.880987 0.002348975 * How to interpret the results of a sphericity test The both procedures explained above give us the same results. Here, I take the results of the procedure using car package to explain how we can interpret the results. The test statistics of Mauchly's test is W, which is 0.64 in this example. The degree of freedom for the Mauchly's sphericity test can be calculated as n(n+1)/2 - 1 where n is the degree of freedom of the factor. For instance, the degree of freedom of Group is 2. Thus, The degree of freedom for Group in the Mauchly's sphericity test is 2 * 3 / 2 - 1 = 2. The result of “Mauchly Tests for Sphericity” shows p > 0.05. So, we can assume the sphericity and don't need for making any correction on the data. And we have a significant effect of design (which is Group) by looking at the results of ANOVA (see the results under Univariate Type III Repeated-Measures ANOVA Assuming Sphericity). What if we cannot assume the sphericity? Then, we need to use Greenhouse-Geisser and Huynh-Feldt Corrections. They have a value called epsilon. The epsilon means the departure from the sphericity, in other words, how far the data is from the ideal sphericity. The epsilon is a number between 0 and 1, if the epsilon is equal to 1, the data have the sphericity. I omit the difference between Greenhouse-Geisser and Huynh-Feldt (you actually see another value called lower-bound in SPSS, which means the possible lowest value for the epsilon), but generally the epsilon of Greenhouse-Geisser is used. If the epsilon of Greenhouse-Geisser is greater than 0.75, you should use the epsilon of Huynh-Feldt. This is because Greenhouse-Geisse tends to make the analysis too strict when the epsilon is large. It is also considered that you should use Huynh-Feldt when you have a small sample size (like 10). You can make a correction by multiplying the degree of freedom by the epsilon. Let's assume that we need to make a correction. The degree of freedom of design is 2. The epsilon of Greenhouse-Geisser is 0.734. Thus, the new degree of freedom after the correction is 1.468. You also do the same thing for the second degree of freedom (i.e., 14 * 0.734 = 10.276). So now you say you have F(1.47, 10.28)=10.92 instead of F(2, 14)=10.92. The p value with the adjusted F value is available right next to the epsilon. In this example, the p value is 0.00452, and you have a significant effect. You don't have to do anything for the effect size. If the degree of freedom of your factor is 1, you don't need any correction. Thus, R won't show any information about the correction for that factor. The theoretical lower bound of the epsilon is 1 / [degree of freedom]. So, if your factor's degree of freedom is 5, the theoretical lower bound of epsilon is 0.2. Thus, the adjusted degree of freedom cannot be below 1 in any case. So, it is a bit complicated to do repeated-measure ANOVA in R. But it is not impossible. :) Another way to do a Mauchly's test In some cases, the two methods described above cause some errors. Particularly you may see an error like this when you use Anova(). Error in linear.hypothesis.mlm(mod, hyp.matrix, SSPE = SSPE, idata = idata, : The error SSP matrix is apparently of deficient rank.... If you have this error, ezANOVA() won't work either because it uses Anova() internally. In this case, you can run a Mauchly's test in a more manual way. For more details, see the section for two-way repeated-measure ANOVA. Effect size Another thing we need to talk about is the effect size. There are a few standard metrics to present the effect size for ANOVA, but the most common metrics are eta squared ($\small \eta^{2}$) and partial eta squared. The eta squared is a square of the correlation ratio, and represents how much the factor we are talking about affects the results of the total sum of square. We can calculate eta squared as follows: $\Large \eta^{2} = \frac{SS_{effect}}{SS_{total}}$, where SS_effect is the sum of square for the factor and SS_total is the total sum of square. However, when you have many factors (or independent variables), the eta squared often becomes too small. To address this, the partial eta squared is used. $\Large \eta_{partial}^{2} = \frac{SS_{effect}}{SS_{effect} + SS_{error}}$, where SS_errror means the sum of square for the error term. You can use either of them (SPSS uses partial eta squared by default), but because there is no way to convert eta squared to partial eta squared or vice versa, you need to clarify which you used in your report. And here are the values which are considered as a small, medium, and large effect size. small sizemedium sizelarge size eta squared0.010.060.14 There is no standard “small,” “medium,” and “large” effect size for the partial eta squared, which may be a disadvantage of using the partial eta squared. Another disappointment of the partial eta-squared is that there is no straightforward way to calculate its confidence interval in R (if you know, please let me know!). We can do it for the eta squared by using the MBESS package, which is explained in the example codes. One-way ANOVA One-way ANOVA means you have one independent variable to use for the comparison of the dependent variables. As we have seen in the examples above, we use one-way ANOVA for testing Value against Group. Thus, what you are looking for is the relationship between Value and Group, and this can be expressed as Value ~ Group in R. What one-way ANOVA does is to see whether Group is really important to predict Value. R code example (One-way ANOVA) One-way ANOVA First, prepare the data. Here, we have three groups. These values are measured in a between-subject manner. Thus, each of Value came from different participants. Group <- factor(c("A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","C","C","C","C","C","C","C","C")) Value <- c(1,2,4,1,1,2,2,3,3,4,4,2,3,4,4,3,4,5,3,5,5,3,4,6) data <- data.frame(Group, Value) Before running a one-way ANOVA test, you need to make sure that the homogeneity of variances assumption holds. There are a couple of ways to do that, but we use Barlett's test here. bartlett.test(Value ~ Group,data) And, you see the result of Barlett's test. Bartlett test of homogeneity of variances data: Value by Group Bartlett's K-squared = 1.0264, df = 2, p-value = 0.5986 Because the p value is over 0.05, the null hypothesis, which is there is no difference in the variances, still holds. Run a one-way ANOVA test. Your independent variable is Group, and dependent variable is Value. In R, you can express this as Value ~ Group. aov <- aov(Value ~ Group, data) summary(aov) Now, you get the result. Df Sum Sq Mean Sq F value Pr(>F) Group 2 22.75 11.3750 12.095 0.0003202 *** Residuals 21 19.75 0.9405 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ Because the F value for Group is large enough, the p value is below 0.05. Thus, you have found a significant effect of Group in the means of Value. For the effect size, both eta squared and partial eta squared become the same in this case. It is 22.75 / (22.75 + 19.75) = 0.535. If you need to calculate the confidence interval of the effect size, it is safer to use the eta squared (at least there is an easy way to calculate its confidence interval in R). You can calculate it by using the MBESS package. library(MBESS) Then, you just need to call the ci.pvaf() function. You need the F value, first and second degree of freedom, and sample size. ci.pvaf(F.value=12.095, df.1=2, df.2=21, N=24) [1] "The 0.95 confidence limits for the proportion of variance of the dependent variable accounted for by knowing group status are given as:"$Lower.Limit.Proportion.of.Variance.Accounted.for [1] 0.1751871 $Upper.Limit.Proportion.of.Variance.Accounted.for [1] 0.6870077 One-way repeated-measure ANOVA This procedure assumes the sphericity implicitly. I recommend you to do a test for the sphericity before doing repeated-measure ANOVA, and the procedure for that is available in the Sphericity section. The results of the sphericity test for the data we use here is also available there as an example of the test. First, prepare the data. Here, we have three groups. Group <- c("A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","C","C","C","C","C","C","C","C") Value <- c(1,2,4,1,1,2,2,3,3,4,4,2,3,4,4,3,4,5,3,5,5,3,4,6) Participant <- c("1","2","3","4","5","6","7","8","1","2","3","4","5","6","7","8","1","2","3","4","5","6","7","8") data <- data.frame(Participant, Group, Value) For repeated-measure ANOVA, you need to specify which the repeated measure factor is (in this example, Participant). In R, you can do so in Error(). aov <- aov(Value ~ factor(Group) + Error(factor(Participant)/factor(Group)), data) summary(aov) Then, you can get the result. Error: factor(Participant) Df Sum Sq Mean Sq F value Pr(>F) Residuals 7 5.1667 0.7381 Error: Within Df Sum Sq Mean Sq F value Pr(>F) factor(Group) 2 22.750 11.3750 10.92 0.001388 ** Residuals 14 14.583 1.0417 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 As you can see here, R considers Participants as one random factor (or an error term), and it is used as the repeated-measure factor. This is like saying “we know that Participants is causing some effects on our observations, so let's remove these effects.” If you are using SPSS, you probably are told that you need to use the repeated-measure model for doing repeated-measure ANOVA (and there are reasons for doing so), but you can still do it with the univariate model. All you have to do is put your repeated-measure factor into “Random Factor(s),” and you will get the same results (be careful that the way to display the results is different from when you use the repeated-measure model). By comparing the results of one-way ANOVA (the data are the same), you notice some differences in the results. This is why your experimental design is important. It does affect your analysis. You may have noticed the total of the sum of square for Participants (5.17) and for errors (Residuals, 14.58) in one-way repeated-measure ANOVA is equal to the sum of square for errors in one-way ANOVA (19.75). What repeated-measure ANOVA does is to separate the effect caused by the participants (or individual differences) from the errors whose causes we really don't know. Thus, if you calculate the partial eta squared, it becomes larger (22.75 / (22.75 + 14.58) = 0.61) than in one-way ANOVA. This is how repeated-measure ANOVA takes the within-subject factor into account. However, if you calculate the eta squared, it is 22.75 / (22.75 + 14.583 + 5.1667) = 0.535, which is the same as when we do one-way ANOVA . This is because the eta squared considers all the mean squares. For its confidence interval, you can use the MBESS package. library(MBESS) ci.pvaf(F.value=10.92, df.1=2, df.2=14, N=24) [1] "The 0.95 confidence limits for the proportion of variance of the dependent variable accounted for by knowing group status are given as:"$Lower.Limit.Proportion.of.Variance.Accounted.for [1] 0.1214267 $Upper.Limit.Proportion.of.Variance.Accounted.for [1] 0.6806342 How to report the results of one-way ANOVA Reporting the results of one-way ANOVA is fairly simple. In the example above, you can report something like: Bartlett's test did not show a violation of homogeneity of variances ($\small \chi^{2}(2)$ = 1.03, p = 0.60). With one-way ANOVA, we found a significant effect of Group on Value (F(2,21)=12.01, p<0.01, partial $\small \eta^{2}$=0.54). With Levene's test, you report the F value (so it is like F(df1, df2) = [some values]) instead of the chi-squared for Bartlett's test. The style is pretty much the same for the case of repeated-measure ANOVA. Make sure you have to do a test for sphericity: Mauchly's test did not show a violation of sphericity against Group (W(2) = 0.64, p = 0.26). With one-way repeated-measure ANOVA, we found a significant effect of Group on Value (F(2,14)=10.92, p<0.01, partial $\small \eta^{2}$=0.61). If you have done any correction on the degrees of freedom, you need to explain those details (e.g., which correction you used) as well: Mauchly's test showed a violation of sphericity against Group ([the W value you got], p < 0.05). We ran one-way repeated-measure ANOVA with Greenhouse-Geisser correction ($\small \epsilon$=0.74). It revealed a significant effect of Group on Value (F(1.47, 10.28)=10.92, p< 0.01, partial $\small \eta^{2}$=0.61). If you need to report the confidence interval of the effect size, you can do something like this: With one-way ANOVA, we found a significant effect of Group on Value (F(2,21)=12.01, p<0.01, $\small \eta^{2}$=0.54 with CI = [0.18, 0.69]). Note that we use the eta squared, not the partial eta squared. It is often common that the results of a homogeneity or sphericity test are omitted. Follow what your research community does. Remember that you have to run a post-hoc test if you find a significant difference. Two-way ANOVA Two-way ANOVA means you have two kinds of independent variables to use for your comparison. So it just sound like you have one more independent variable in one-way ANOVA. But it is not quite true, and there are some complicated stuff. One of these stuff is the type of the sum of squares. Types of Sum of Squares The sum of squares, more precisely the sum of the squared deviations, is one of the key metrics in ANOVA. It is kind of similar to variance, and represents how spread the data are from the mean. What ANOVA does is to calculate the sum of squares (SSs) for each factor (independent variable) as well as one for the dependent variable. As you may have noticed in the examples of one-way ANOVA, ANOVA creates a linear model with the independent and dependent variables, like [dependent] = a + b [independent]. This is kind of a rough explanation, but I think it is easy to understand. So, in two-way ANOVA, what we are going to do is to build a model like [dependent] = a + b [independent 1] + c [independent 2], and see how well this model describes the dependent variable. Unfortunately, this is not the end of the story. First, there may be some effects caused by having both two independent variables. This is called an interaction whereas the effect caused by one factor is called a main effect. So in many cases of two-way ANOVA, the model to build is like [dependent] = a + b [independent 1] + c [independent 2] + d [independent 1] * [independent 2]. Another bad news is that this model causes another problem on how to calculate the sum of squares. There are three kinds of types of sum of squares commonly used: Type I, Type II, and Type III. There are actually three more (Type IV, Type V and Type VI), but they are rarely used. • Type I SS: SSs are calculated by the model specified. For instance, if the model is specified as Y = aA +bB , it calculate the SS for A with Y = aA, and then calculates the SS for B with Y = aA + bB . This means that the order of the factors in the model matters (unless A and B are completely orthogonal). • Type II SS: SSs are calculated in a similar way to the case of Type I SS, but the SS for each factor is calculated given all other factors in the model except higher order factors which contain that factor. For example, if the model is like Y = aA + bB + cA*B , the SS for A is calculated given B, but not A*B. • Type III SS: The SS for each factor is calculated with considering the effects from all the other factors. One thing you should know is that a Type II and aType III SS do not depend on the order of the factors in the model whereas a Type I SS does. We really don't want to have effects caused by the model in most cases, so want to avoid using Type I. Unfortunately, R uses Type I SSs by default (aov function). SPSS, instead, uses Type III SSs by default. If you have unbalanced data (the sample sizes are fairly different across your groups), it is recommended to use Type II SSs. As you may guess, there are long discussions about which type should be used, and each type has pros and cons. Well, this sounds complicated, right? But here is a good news. If your data are balanced (the same sample size for each group to compare), all of those SSs will be the same. So, if this is the case, you don't have to worry about it much. And if you don't have any interaction term, Type II SSs and Type III SSs will be the same regardless of balanced or unbalanced data. But if your data are unbalanced and your model includes any interaction term, Type II SSs and Type III SSs will be different. Thus, you have to be careful about types of SSs particularly when Type II SSs and Type III SSs becomes different. R code example (Two-way ANOVA) Two-way ANOVA for balanced data First, prepare the data. Device <- factor(c(rep("Pen",24), rep("Touch",24))) Technique <- factor(rep(c(rep("A",8), rep("B",8), rep("C",8)), 2)) Time <- c(1.2,1.4,1.8,2.0,1.1,1.5,1.5,1.7, + 2.1,2.5,2.2,2.2,2.9,2.3,2.3,2.6, + 3.5,3.4,3.3,3.2,2.9,2.8,3.8,3.4, + 2.4,1.8,2.5,2.1,2.2,1.9,1.7,2.3, + 2.8,3.1,3.2,4.0,2.9,3.6,3.2,3.7, + 4.5,4.8,4.7,4.1,4.1,4.2,4.6,4.9) Here we have two independent variables, Device and Technique. To do a two-way ANOVA test with type III SSs, we need to include car library. library(car) Before doing ANOVA, we need to check the homogenity. Here, we use Levene's test. levene.test(Time ~ Device * Technique) Levene's Test for Homogeneity of Variance Df F value Pr(>F) group 5 0.3471 0.8812 42 Thus, F(5, 42) = 0.35, p = 0.88, and we can assume the homogenity. Then, run a two-way ANOVA. Anova in car library requires us to explicitly build a linear model. We can use lm() for this. But before doing ANOVA, we need to run a magic code. (I won't explain what this magic code means for now). options(contrasts=c("contr.sum", "contr.poly")) Anova(lm(Time ~ Device * Technique), type="III") Now you get the result. Anova Table (Type III tests) Response: Time Sum Sq Df F value Pr(>F) (Intercept) 390.45 1 3762.2962 < 2.2e-16 *** Device 9.81 1 94.5291 2.581e-12 *** Technique 34.24 2 164.9547 < 2.2e-16 *** Device:Technique 0.75 2 3.6275 0.03522 * Residuals 4.36 42 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 We have significant main effects of Device and Technique, and we also have a significant interaction of Device and Technique. The effect size (partial eta-squared) can be calculated from the values of “Sum Sq” of the factor and Residuals. For example, the partial eta-squared for Technique is 34.24 / (34.24 + 4.36) = 0.887. If you have to report the confidence interval of the effect size, you have to calculate the eta squared. For example, the eta-squared for Technique is 34.24 / (34.24 + 9.81 + 0.75 + 4.36) = 0.697. You can find how to calculate the confidence interval for the eta squared RCodeOneWay in the example code for one-way ANOVA. Two-way repeated-measure ANOVA for balanced data with car package We use the same data for two-way ANOVA, but assume the within-subject design. First, we reform the data. Device <- factor(c(rep("Pen",24), rep("Touch",24))) Technique <- factor(rep(c(rep("A",8), rep("B",8), rep("C",8)), 2)) Time <- c(1.2,1.4,1.8,2.0,1.1,1.5,1.5,1.7, + 2.1,2.5,2.2,2.2,2.9,2.3,2.3,2.6, + 3.5,3.4,3.3,3.2,2.9,2.8,3.8,3.4, + 2.4,1.8,2.5,2.1,2.2,1.9,1.7,2.3, + 2.8,3.1,3.2,4.0,2.9,3.6,3.2,3.7, + 4.5,4.8,4.7,4.1,4.1,4.2,4.6,4.9) data <- data.frame(Device, Technique, Time) data2 <- with(data, cbind(Time[Device=="Pen"&Technique=="A"],Time[Device=="Pen"&Technique=="B"], + Time[Device=="Pen"&Technique=="C"],Time[Device=="Touch"&Technique=="A"], + Time[Device=="Touch"&Technique=="B"],Time[Device=="Touch"&Technique=="C"])) data2 Now, your data look like this. [,1] [,2] [,3] [,4] [,5] [,6] [1,] 1.2 2.1 3.5 2.4 2.8 4.5 [2,] 1.4 2.5 3.4 1.8 3.1 4.8 [3,] 1.8 2.2 3.3 2.5 3.2 4.7 [4,] 2.0 2.2 3.2 2.1 4.0 4.1 [5,] 1.1 2.9 2.9 2.2 2.9 4.1 [6,] 1.5 2.3 2.8 1.9 3.6 4.2 [7,] 1.5 2.3 3.8 1.7 3.2 4.6 [8,] 1.7 2.6 3.4 2.3 3.7 4.9 Each row represents each participant, and each column represents a different condition. We now make the variables for representing these conditions. D <- factor(c("P","P","P","T","T","T")) T <- factor(c("A","B","C","A","B","C")) factor <- data.frame(D,T) factor D and T represent Device and Technique, respectively. And factor should look like this. D T 1 P A 2 P B 3 P C 4 T A 5 T B 6 T C The rows of factor should correspond to the column of data2. For example, the first column in data2 is Pen + Technique A, which is P A in factor. Now, run ANOVA. Make sure to run a magic code using options() function (I won't explain what this magic code means for now). options(contrasts=c("contr.sum", "contr.poly")) model <- lm(data2 ~ 1) library(car) aov <- Anova(model, idata=factor, idesign=~D*T, type="III") summary(aov, multivariate=FALSE) idata means the factors you want to compare, and idesign means the factors which are affected by the within-subject design. In this example, each participant experienced all Devices and Techniques, so we include both D and T. We use Type III SSs. Now you have the results. Univariate Type III Repeated-Measures ANOVA Assuming Sphericity SS num Df Error SS den Df F Pr(>F) (Intercept) 390.45 1 0.81146 7 3368.1969 1.183e-10 *** D 9.81 1 0.25146 7 273.0928 7.252e-07 *** T 34.24 2 1.76542 14 135.7557 6.816e-10 *** D:T 0.75 2 1.53042 14 3.4438 0.06077 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Mauchly Tests for Sphericity Test statistic p-value T 0.78490 0.48355 D:T 0.48155 0.11166 Greenhouse-Geisser and Huynh-Feldt Corrections for Departure from Sphericity GG eps Pr(>F[GG]) T 0.82298 1.831e-08 *** D:T 0.65856 0.08884 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 HF eps Pr(>F[HF]) T 1.04292 6.816e-10 *** D:T 0.75112 0.08013 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Warning: In summary.Anova.mlm(aov, multivariate = F) : HF eps > 1 treated as 1 Mauchly Tests do not show any significant effect, so we don't need any correction. According to the section of Univariate Type III Repeated-Measures ANOVA Assuming Sphericity, both Device and Technique (D and T) have significant effects and the interaction has a marginal significant effect. The effect size (partial eta squared) for each factor can be easily calculated from the values of “SS” and “Error SS”. For instance, the partial eta squared of Technique is 34.24 / (34.24 + 1.76542) = 0.951. Similarly the partial eta squared of Device is 9.81 / (9.81 + 0.25146) = 0.975. You don't have to do any adjustment on the sum of square even if you have to do Greenhouse-Geisser or Huynh-Feldt correction. Calculating the eta squared is a bit trickier. You have to pick up all the values in “SS” and “Error SS” except ones for the intercept. So in this example, the effect size of Technique is 34.24 / (34.24 + 9.81 + 0.75 + 0.25146 + 1.76542 + 1.53042 + 0.81146) = 0.697. You can find how to calculate the confidence interval for the eta squared in the example code for one-way ANOVA. You may also have noticed that there is no correction for Device. This is because the degree of freedom of Device is 1 (there are only two options for Device). This is kind of similar to a paired t test. If you do a paired t test, the test looks at the difference between the two groups rather than separate distributions of two groups. Thus, the equality of variances of the two groups doesn't matter. The same thing is happening here. Because Device is a within-subject factor, our interest is the distribution after we take the difference between the two groups, and thus we only have one distribution to test. But for Technique, we still have more than one distribution even after we take the differences (because we had three techniques), and Technique may need a correction. Two-way repeated-measure ANOVA for balanced data with ez package There is another way to do two-way ANOVA, in which we use ezANOVA() function. Although this function uses Anova() function internally, it is easier to use, so you may prefer this method. Participant <- factor(rep(c("1", "2", "3", "4", "5", "6", "7", "8"), 6)) Device <- factor(c(rep("Pen",24), rep("Touch",24))) Technique <- factor(rep(c(rep("A",8), rep("B",8), rep("C",8)), 2)) Time <- c(1.2,1.4,1.8,2.0,1.1,1.5,1.5,1.7, + 2.1,2.5,2.2,2.2,2.9,2.3,2.3,2.6, + 3.5,3.4,3.3,3.2,2.9,2.8,3.8,3.4, + 2.4,1.8,2.5,2.1,2.2,1.9,1.7,2.3, + 2.8,3.1,3.2,4.0,2.9,3.6,3.2,3.7, + 4.5,4.8,4.7,4.1,4.1,4.2,4.6,4.9) data <- data.frame(Device, Technique, Participant, Time) library(ez) options(contrasts=c("contr.sum", "contr.poly")) ezANOVA(data=data, dv=.(Time), wid=.(Participant), within=.(Technique, Device), type=3) With ezANOVA() function, you have to specify several things. data is the dataframe you are using. dv means the dependent variable, which is Time in this case. wid means the indices that represent different cases or participants. In this example, Participant represent eight different cases. within means the within-subject factors, which are Technique and Device. If you have a between-subject factor, you can also use the between option. You have to specify the factors and variables like .([factor_name]). When you execute this command, you get the following result.$ANOVA Effect DFn DFd SSn SSd F p p<.05 ges 1 (Intercept) 1 7 390.4502083 0.8114583 3368.196919 1.183256e-10 * 0.9889599 2 Technique 2 14 34.2379167 1.7654167 135.755723 6.816031e-10 * 0.8870693 3 Device 1 7 9.8102083 0.2514583 273.092792 7.252441e-07 * 0.6923733 4 Technique:Device 2 14 0.7529167 1.5304167 3.443779 6.076907e-02 0.1472938 $Mauchly's Test for Sphericity Effect W p p<.05 2 Technique 0.7849017 0.4835549 4 Technique:Device 0.4815467 0.1116645$Sphericity Corrections Effect GGe p[GG] p[GG]<.05 HFe p[HF] p[HF]<.05 2 Technique 0.8229787 1.831468e-08 * 1.0429184 6.816031e-10 * 4 Technique:Device 0.6585649 8.883789e-02 0.7511202 8.012761e-02

Although the format is different, this shows the same information as the results of Anova() function.

Two-way repeated-measure ANOVA for balanced data (running manually)

In some cases, the two methods explained above won't work. With Anova() function, you may have an error like this.

Error in linear.hypothesis.mlm(mod, hyp.matrix, SSPE = SSPE, idata = idata, : The error SSP matrix is apparently of deficient rank....

This means that the degree of freedom for one the factors you have becomes larger than the number of different cases in terms of the within-subject factors (i.e., the number of the rows in data2 in the example above). In this case, Anova() function cannot calculate a matrix necessary for the test.

In this case, we have to do a test in a more manual manner as follows. Please remember that the following procedure uses Type III SSs. The first part of th procedure is pretty much the same as the way with car package. I don't explain details of what each command means in this procedure for now. I may add them later, but if you are interested, please look at the manual by executing ?anova.mlm.

Participant <- factor(rep(c("1", "2", "3", "4", "5", "6", "7", "8"), 6)) Device <- factor(c(rep("Pen",24), rep("Touch",24))) Technique <- factor(rep(c(rep("A",8), rep("B",8), rep("C",8)), 2)) Time <- c(1.2,1.4,1.8,2.0,1.1,1.5,1.5,1.7, + 2.1,2.5,2.2,2.2,2.9,2.3,2.3,2.6, + 3.5,3.4,3.3,3.2,2.9,2.8,3.8,3.4, + 2.4,1.8,2.5,2.1,2.2,1.9,1.7,2.3, + 2.8,3.1,3.2,4.0,2.9,3.6,3.2,3.7, + 4.5,4.8,4.7,4.1,4.1,4.2,4.6,4.9) data <- data.frame(Device, Technique, Participant, Time) data2 <- with(data, cbind(Time[Device=="Pen"&Technique=="A"],Time[Device=="Pen"&Technique=="B"], + Time[Device=="Pen"&Technique=="C"],Time[Device=="Touch"&Technique=="A"], + Time[Device=="Touch"&Technique=="B"],Time[Device=="Touch"&Technique=="C"])) D <- factor(c("P","P","P","T","T","T")) T <- factor(c("A","B","C","A","B","C")) factor <- data.frame(D,T) options(contrasts=c("contr.sum", "contr.poly")) model <- lm(data2 ~ 1)

Now, we create another model for the test.

> model0 <- update(model, ~0)

Then, we use anova() function. Please note that the function name is anova with all small letters. With anova() function, you have to execute the command with the models for each factor. Remember that our model for a two-way ANOVA test is y ~ T + D + T:D where T is Technique, and D is Device. So, we have to do a test for T, D, and T:D separately. Before doing an ANOVA test, let's do Mauchly's test for sphericity.

mauchly.test(model, M = ~T, X = ~1, idata=factor) Mauchly's test of sphericity Contrasts orthogonal to ~1 Contrasts spanned by ~T data: SSD matrix from lm(formula = data2 ~ 1) W = 0.7849, p-value = 0.4836

The degree of freedom for this Mauchly's sphericity test can be calculated as n(n+1)/2 - 1 where n is the degree of freedom of the factor. For instance, the degree of freedom of Technique is 2. Thus, The degree of freedom for Technique in the Mauchly's sphericity test is 2 * 3 / 2 - 1 = 2.

In this case, p > 0.05. So we don't need any correction. Move on to the ANOVA test.

anova(model, model0, M=~T, X=~1, idata=factor, test="Spherical") Analysis of Variance Table Model 1: data2 ~ 1 Model 2: data2 ~ 1 - 1 Contrasts orthogonal to ~1 Contrasts spanned by ~T Greenhouse-Geisser epsilon: 0.823 Huynh-Feldt epsilon: 1.043 Res.Df Df Gen.var. F num Df den Df Pr(>F) G-G Pr H-F Pr 1 7 0.03225 2 8 1 0.20055 135.76 2 14 6.816e-10 1.8315e-08 6.816e-10

Thus, F(2,14) = 135.76, p < 0.001. If you need any correction, you can see the p values under “G-G Pr ” or “ H-F Pr”. You can do Mauchly's test and ANOVA for Device in the same way (just replacing T with D in the commands above). But you don't need to run Mauchly's test for Device. Because the degree of freedom of Device is 2, you can automatically assume the sphericity.

So, how about an interaction term? In this case, you have to change the command a bit. Please note that the changes on the options for M and X.

mauchly.test(model, M = ~T:D, X = ~T+D, idata=factor) Mauchly's test of sphericity Contrasts orthogonal to ~T + D Contrasts spanned by ~T:D data: SSD matrix from lm(formula = data2 ~ 1) W = 0.4815, p-value = 0.1117

Again, p > 0.05, so we don't need any correction. The degree of freedom for this interaction term is ([degree of freedom of Technique] - 1) * ([degree of freedom of Device] - 1) = 2 * 1 = 2. Thus the degree of freedom for this term in a Mauchly's test is also 2 (because 2 * 3 / 2 - 1). For ANOVA,

anova(model, model0, M=~T:D, X=~T+D, idata=factor, test="Spherical") Analysis of Variance Table Model 1: data2 ~ 1 Model 2: data2 ~ 1 - 1 Contrasts orthogonal to ~T + D Contrasts spanned by ~T:D Greenhouse-Geisser epsilon: 0.6586 Huynh-Feldt epsilon: 0.7511 Res.Df Df Gen.var. F num Df den Df Pr(>F) G-G Pr H-F Pr 1 7 0.021898 2 8 1 0.038654 3.4438 2 14 0.060769 0.088838 0.080128

Thus, F(2,14) = 3.44, p = 0.06.

If the degree of freedom of a factor is larger than the number of the subjects repeatedly measured (e.g., participants), Mauchly's test returns NaN (Not a Number). This means it cannot run the test because you need more subjects repeatedly measured (participants). In SPSS, the results for this factor become just blank instead of having any error message.

This NaN can happen when you test an interaction term. Suppose you test five different techniques and four different devices with ten participants. The degree of freedome of the interaction term of the techniques and devices is (5 - 1) * (4 - 1) = 12, which is larger than 10 (number of the participants). In this case, you can do ANOVA, but cannot Mauchly's test properly.

I haven't figured out whether we should make any correction when Mauchly's test is impossible to run. I will post once I figure it out. My current guess is that we just cannot do any kind of correction.

How to report the results of two-way ANOVA

Reporting the results of two-way ANOVA is pretty much the same as reporting the ones of one-way ANOVA. You just need to explain the results of each factor and the interaction terms:

With two-way ANOVA, we found significant main effects of Device (F(1,42)=94.5, p<0.01, partial <html><img src=“/mimetex/mimetex.cgi?\small \eta^{2}”></html>=0.69) and Technique (F(2,42)=165, p<0.01, partial <html><img src=“/mimetex/mimetex.cgi?\small \eta^{2}”></html>=0.89) on the performance time. We also found a significant interaction of Device and Technique (F(2,42)=3.63, p<0.05, partial <html><img src=“/mimetex/mimetex.cgi?\small \eta^{2}”></html>=0.15).

You may also want to report the results of the test for homogeneity of variances or sphericty before this.

And remember that you have to run a post-hoc test if you find a significant difference.

Mixed-design ANOVA

Mixed-design means that you have both within-subject and between-subject factors. Let's think about this with the same examples for two-way ANOVA. You have two input devices (Pen and Touch), and three techniques for each device (A, B, and C). Let's say you recruited different people for the devices: 8 people for Pen and other 8 people for Touch. But within each device, every participant was exposed to the three techniques. In this case, Device is a between-subject factor, and Technique is a within-subject factor.

There are some other examples which usually are considered as between-subject factors: Gender (male or female), age group (younger or older group), handedness (right-handed or left-handed), and country (US, Canada, Japan,…). Before conducting a test, you have to be clear on which variable is within- or between-subject.

<rcoee> > Participant <- factor(cbind(rep(c("1", "2", "3", "4", "5", "6", "7", "8"), 3), + rep(c("9", "10", "11", "12", "13", "14", "15", "16"), 3))) > Device <- factor(c(rep("Pen",24), rep("Touch",24))) > Technique <- factor(rep(c(rep("A",8), rep("B",8), rep("C",8)), 2)) > Time <- c(1.2,1.4,1.8,2.0,1.1,1.5,1.5,1.7, + 2.1,2.5,2.2,2.2,2.9,2.3,2.3,2.6, + 3.5,3.4,3.3,3.2,2.9,2.8,3.8,3.4, + 2.4,1.8,2.5,2.1,2.2,1.9,1.7,2.3, + 2.8,3.1,3.2,4.0,2.9,3.6,3.2,3.7, + 4.5,4.8,4.7,4.1,4.1,4.2,4.6,4.9) data <- data.frame(Device, Technique, Participant, Time) data </rcode>

data should look like this.

Device Technique Participant Time 1 Pen A 1 1.2 2 Pen A 2 1.4 3 Pen A 3 1.8 4 Pen A 4 2.0 5 Pen A 5 1.1 6 Pen A 6 1.5 7 Pen A 7 1.5 8 Pen A 8 1.7 9 Pen B 1 2.1 10 Pen B 2 2.5 11 Pen B 3 2.2 12 Pen B 4 2.2 13 Pen B 5 2.9 14 Pen B 6 2.3 15 Pen B 7 2.3 16 Pen B 8 2.6 17 Pen C 1 3.5 18 Pen C 2 3.4 19 Pen C 3 3.3 20 Pen C 4 3.2 21 Pen C 5 2.9 22 Pen C 6 2.8 23 Pen C 7 3.8 24 Pen C 8 3.4 25 Touch A 9 2.4 26 Touch A 10 1.8 27 Touch A 11 2.5 28 Touch A 12 2.1 29 Touch A 13 2.2 30 Touch A 14 1.9 31 Touch A 15 1.7 32 Touch A 16 2.3 33 Touch B 9 2.8 34 Touch B 10 3.1 35 Touch B 11 3.2 36 Touch B 12 4.0 37 Touch B 13 2.9 38 Touch B 14 3.6 39 Touch B 15 3.2 40 Touch B 16 3.7 41 Touch C 9 4.5 42 Touch C 10 4.8 43 Touch C 11 4.7 44 Touch C 12 4.1 45 Touch C 13 4.1 46 Touch C 14 4.2 47 Touch C 15 4.6 48 Touch C 16 4.9

As you can see, we have different participants for Pen and Touch. Now, do a mixed-design ANOVA test.

library(ez) options(contrasts=c("contr.sum", "contr.poly")) ezANOVA(data=data, dv=.(Time), wid=.(Participant), within=.(Technique), between=.(Device), type=3)

Here, we specify the within-subject factor and between-subject factor by saying within=.(Technique), between=.(Device). And you get the results.

$ANOVA Effect DFn DFd SSn SSd F p p<.05 ges 1 Device 1 14 9.8102083 1.062917 129.21325 1.868876e-08 * 0.6923733 2 Technique 2 28 34.2379167 3.295833 145.43540 1.620339e-15 * 0.8870693 3 Device:Technique 2 28 0.7529167 3.295833 3.19823 5.610737e-02 0.1472938$Mauchly's Test for Sphericity Effect W p p<.05 2 Technique 0.9301473 0.6245776 3 Device:Technique 0.9301473 0.6245776 \$Sphericity Corrections Effect GGe p[GG] p[GG]<.05 HFe p[HF] p[HF]<.05 2 Technique 0.9347082 1.218572e-14 * 1.073371 1.620339e-15 * 3 Device:Technique 0.9347082 6.018604e-02 1.073371 5.610737e-02

You can compare the results of the repeated-measure two-way ANOVA example. You will notice that some values are different (e.g., the denominator degree of freedom (DFd) for Technique). These differences are caused by the experimental design.

You can also use aov() function, but remember that aov() function uses Type I SSs.

aov <- aov(Time ~ Technique * Device + Error(Participant/Device), data=data) summary(aov) Error: Participant Df Sum Sq Mean Sq F value Pr(>F) Residuals 7 0.81146 0.11592 Error: Participant:Device Df Sum Sq Mean Sq F value Pr(>F) Device 1 9.8102 9.8102 273.09 7.252e-07 *** Residuals 7 0.2515 0.0359 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Error: Within Df Sum Sq Mean Sq F value Pr(>F) Technique 2 34.238 17.1190 145.4354 1.620e-15 *** Technique:Device 2 0.753 0.3765 3.1982 0.05611 . Residuals 28 3.296 0.1177 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

ANOVA for unbalanced data

We have seen examples with balanced data, but what should we do if you have unbalanced data? Although ANOVA can accommodate mildly unbalanced data and you can use the same codes presented in this page, you have to be more careful than when you have balanced data. The main reason is that the results may become different depending on the types of sum of squares.

If you do one-way ANOVA, it does not really matter whether the data are balanced or unbalanced (unless the sample size for each group is extremely different). But if you do two-way or mixed-design ANOVA, you need to be careful about which type of sum of squares your test is using.

For unbalanced data, using Type II SSs seems to be recommended in general. But please double-check with stats experts or other references.

n-way ANOVA?

We have looked at one-way and two-way ANOVA. So, why not three-way ANOVA or say n-way ANOVA more generally? Yes, we can do that mathematically. The problem is that the analysis becomes really hard. For instance, if you do a three-way ANOVA test, you have the three main effects, three interactions of two factors, and one interaction of three factors. And let's say if you have a significant effect of some of them and have to do a post-hoc test? Your post-hoc test would have a huge list of comparisons. So, I recommend you to design your experiment so that you don't have to use a more complicated test than two-way ANOVA.

Post-hoc test

If you have found any significant effect, you are not done yet. You may want to know exactly where you have differences. For this purpose, you need to run a post-hoc test, which is also a long topic. For more details, please jump to the post-hoc test page.