t test

Introduction

A t test is a very standard statistical test to compare the means of two groups. You can run a t test with Microsoft Excel (probably openoffice too) as well as statistical software. I believe that doing a t test is quite straightforward. It is also a good method to start with if you are not comfortable with statistics.

As other kinds of statistical test do, a t test makes some assumptions. The first important assumption is that the distribution of the population of your sample data is normal. Notice that a t test cares about the distribution of the population, not the distribution of your samples. If you have enough samples, this assumption doesn't cause a lot of problem.

The next step is to decide which t test you want to use: an paired or unpaired?

Paired vs. unpaired

There is one important thing you need to understand in a t test: Paired and unpaired. Unpaired means that you simply compare the two groups. So, you will build a model for each group (calculate the mean and variance), and see whether there is a difference. Paired means that you will look at the differences between the two groups. A paired test first calculates the difference from one group to the other, and runs a one-sample t test.

So, how should you decide which one to use? It depends on your experimental design. If you use a within-subject design, you should use a paired t test. Because each participant contributed to a data point for each of the two groups (e.g., two interaction techniques), your data have an implicit relationship based on your participants. For example, some participants were slower than other participants in both techniques, but all of the participants were faster with one technique than with the other. paired t test considers such individual differences by taking the differences between the two groups. Otherwise, you can't really have a legitimate way to take the differences, so you need to use an unpaired t test.

Effect size

For t tests, you probably also want to report the effect size. The general explanation of effect sizes is available here, and here I explain how to calculate the effect size for a t test.

There are two kinds of effect size metrics for a t test: Cohen's d, and Pearson's r. Both metrics are commonly used and you can pick up either of them (there is a way to convert d to r, and vice versa, so it really doesn't matter which one you use). But remember that you cannot use Pearson's r for a paired t test. Thus, you have to use Cohen's d in this case.

It depends on the fields what size is considered as a small or large effect size, but here are some standard thresholds. Remember that you cannot use r for a paired t test.

	small size	medium size	large size
Cohen's d	0.2	0.5	0.8
Pearson's r	0.1	0.3	0.5

Effect size for a paired t test

Cohen's d for an unpaired t test can be calculated as follows:

$d=\frac{|M|}{SD}$ .

where M is the mean of differences, and SD is the standard deviation of differences.

Effect size for an unpaired t test

Cohen's d for an unpaired t test can be calculated as follows:

$d=\frac{|\mu_{1} - \mu_{2}|}{ \sqrt{\frac{(n_{1} - 1)s_{1}^{2} + (n_{2} - 1)s_{2}^{2}}{n_{1} + n_{2} - 2}}}$ .

In some of the literature I read, the denominator is n_1 + n_2. I think this also makes sense, but I an not 100% sure which one is right. Because both seem to be used and MBESS package (a package for calculating the effect size) uses n_1 + n_2 - 2, I picked up the above formula. For a unpaired t test, you can also use Pearson's r. Pearson's r is slightly simpler than d:

$r=\sqrt{\frac{t^{2}}{t^{2} + df}}$ .

where t is the t value of the test, and df is the degree of freedom.

A paired t test

You should use a paired t test if you do a within-subject design. What a paired t test does is to take differences between data in the two groups, and see whether the distribution of the differences is too different from the t distribution. Because it uses the differences between the groups, a paired t test does not assume the variances of the population of the two groups are equal. But it still assumes the normality. The null hypothesis is there is no significant difference in the means between the two groups. If the p value is less than 0.05, you reject the null hypothesis, and say that you find a significant difference.

R code example

First, prepare the data.

value <- c(1,1,2,3,1,3,2,4,1,2,6,5,1,3,5,1,2,3,4,4) group <- c(0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1) data <- data.frame(group, value)

Then, run a paired t test.

t.test(data[data["group"]==0,2], data[data['group']==1,2], paired=T) Paired t-test data: data[data["group"] == 0, 2] and data[data["group"] == 1, 2] t = -1.7685, df = 9, p-value = 0.1108 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -3.1907752 0.3907752 sample estimates: mean of the differences -1.4

In this example, we do not have a significant effect of Group because p = 0.11. The results also show other information. For instance, the mean of the differences is 1.4 (the means of Group 1 and Group 2 are 2.0, and 3.4 respectively). It also shows 95 percent confidence interval. This is the 95% confidence interval of the mean of the differences. This means that under the criterion of p=0.05, the mean of the differences can be between -3.19 and 0.39. The null hypothesis is that the difference in means is equal to 0. Thus, if the zero is included in the 95% confidence interval of the mean of the differences, we cannot really reject the null hypothesis. This clearly corresponds with the p value we have.

You also need to calculate the effect size. We need to use Cohen's d for a paired t test (remember that we cannot use the calculation of Pearson's r mentioned above for a paired t test). First, we need to calculate the mean and variance for each group. As you can see in the results of the paired t test, the mean of differences is 1.4. So, you need to calculate the standard deviation of differences by hand.

> sd(data[data$group=="0",2] - data[data$group=="1",2]) 2.503331

Thus, Cohen's d is.

1.4 / 2.503331 0.5592548

If you need to report the 95% confidence interval for the effect size, you can use ci.sm() function in MBESS package.

library(MBESS) ci.sm(Mean=1.4, SD=2.503331, N=10, conf.level=0.95) [1] "The 0.95 confidence limits for the standardized mean are given as:" $Lower.Conf.Limit.Standardized.Mean [1] -0.1238246 $Standardized.Mean [1] 0.5592548 $Upper.Conf.Limit.Standardized.Mean [1] 1.216236

Thus, Cohen's d = 0.56 with CI = [-0.12, 1.21].

An unpaired t test

If you are going to use an unpaired t test, you need to consider another assumption, which is about the homogenity of variances: The variances of the population of the two groups are equal. This is important for an unpaired t test. However, we have a t test which can accommodate the unequal variances, which is called a Welch's t test. Unless you can make sure that the variances of the population of the two groups are equal, you can simply use a Welch's t test without thinking too much. This is fair because a Welch's t test is generally a more strict test than a standard t test (i.e., your p value with a Welch's t test becomes higher than one with a standard t test).

A t test has a hypothesis, called null hypothesis. The null hypothesis is there is no significant difference in the means between the two groups. If the p value is less than 0.05, you reject the null hypothesis, and say that you find a significant difference.

As you can see in the following example with R, the beauty of a t test is its simplicity. This is one reason why I like a t test. It is less likely to mess up an analysis compared to other kinds of statistical tests. I recommend you to think about your experiment so that you can do this nice test.

R code example

First, create data with two groups (0 and 1). Let's say Group 0 represents some performance of Technique 1, and Group 1 represents some performance of Technique 2.

value <- c(1,1,2,3,1,3,2,4,1,2,6,5,1,3,5,1,2,3,4,4) group <- c(0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1) data <- data.frame(group, value)

Then, run a Welch's t test (which does not assume the equal variances).

t.test(data[data["group"]==0,2], data[data['group']==1,2], var.equal=F)

You will get the results.

Welch Two Sample t-test data: data[data["group"] == 0, 2] and data[data["group"] == 1, 2] t = -2.2014, df = 14.963, p-value = 0.04382 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -2.75581022 -0.04418978 sample estimates: mean of x mean of y 2.0 3.4

The p value is 0.04 < 0.05, which means that there is a statistically significant difference. Let's see the results with a standard t test (which does assume the equal variances) for comparison.

t.test(data[data["group"]==0,2], data[data["group"]==1,2], var.equal=T) Two Sample t-test data: data[data["group"] == 0, 2] and data[data["group"] == 1, 2] t = -2.2014, df = 18, p-value = 0.04099 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -2.73610126 -0.06389874 sample estimates: mean of x mean of y 2.0 3.4

As you can see here, the p value with a standard t test (0.04099) is slightly smaller than one with a Welch's t test (0.04382). Another point you should look at is the degree of freedom: 18 in a standard t test and 14.963 in a Welch's t test. The intuition of a Welch's t test is to accommodate the unequal variances by adjusting the degree of freedom.

We also need to calculate the effect size. Here, we can use smd() function in MBESS package.

> library(MBESS) > abs(smd(data[data$group=="0",2], data[data$group=="1",2])) 0.9844952

Or you can calculate d manually. But you have to calculate the variance of each group first.

by(data$value, data$group, var) data$group: 0 [1] 1.111111 ----------------------- data$group: 1 [1] 2.933333 (3.4 - 2) / sqrt(((10 - 1) * 1.111111 + (10 - 1) *2.933333) / 18) 0.9844952

If you need the 95% confidence interval for the effect size, you can use ci.smd() function with the t value (2.2014 in this example).

ci.smd(ncp=2.2014, n.1=10, n.2=10, conf.level=0.95) $Lower.Conf.Limit.smd [1] 0.03961892 $smd [1] 0.984496 $Upper.Conf.Limit.smd [1] 1.905349

Thus, Cohen's d = 0.98 with CI = [0.04, 1.90].

How to report

You can report the results in your paper like:

With a Welch's t test, we found a significant effect for techniques (t(15) = 2.20, p < 0.05, Cohen's d=0.98) with Technique 2 outperforming Technique 1.

You can take the absolute of the t value. In this example, the mean of Group 1 is larger than the one of Group 0, and this is why we have a negative for the t value.

Koji Yatani's Course Webpage

Table of Contents

t test

Introduction

Paired vs. unpaired

Effect size

Effect size for a paired t test

Effect size for an unpaired t test

A paired t test

R code example

An unpaired t test

R code example

How to report