t検定

背景

t検定は2つのグループの平均を比較するもっとも基本的な統計手法です．このため，Excelなど表計算ソフトなどでもt検定を実行することができます．t検定は仕組みもシンプルですので，統計手法に詳しくない人はまずここから入るのがいいと思います．

他の統計手法と同じく，t検定はいくつかの前提条件があります．まずデータの母集団の分布が正規性を持つ必要があります．重要なポイントは，「母集団の分布」であって「サンプルの分布」ではない点です．多くの場合，サンプル数が十分にあればあまり強く心配する必要性はないものになります．

次に考えるべきは，対応があるかないか，ということになります．

対応があるかないか

t検定においては理解しておくべき重要なポイントがあります．それは，対応があるかないか，ということです．「対応がない」とは2つのグループをただ単純に比較することを意味します．従って，各グループにおいてモデルを作って，差があるかどうかを確認することになります．一方，「対応がある」とは，サンプルのデータが1対1対応するような関係性があることを意味しています．この場合は，その対応関係から差を計算し，その差が0でありそうかどうかを確認することになります．

ではどっちのt検定を使えばいいのでしょうか？それは実験計画によります．もし被験者内要因がある場合は対応があるt検定を使うことになります．この場合2つのグループにおいて，同じ実験参加者から得られたデータが存在することになります．このような場合にはいわゆる個人差を考慮した上で，2つのグループの差を考えることが可能となります．例えば，実験参加者の中には他の人よりも(コンピュータに不慣れなどに理由で)タスクの完了時間が遅い人がいるかもしれません．このような場合に単純に2つのグループを比較すると差が見えないことがあります．しかし，個人差を除くと，あるグループはもう1つのグループよりもよいことがあり得ます．従って，被験者内要因がある場合は対応がある，それ以外では対応がないt検定を使うことになります．

効果量

t検定の場合，効果量も同時に報告することをおすすめします．効果量に関する説明はこちらに任せるとして，ここではt検定の効果量の計算方法を説明します．

t検定には2つのよく知られた効果量があります．コーヘンのdとピアソンのrです．どちらもよく使われる上，相互に変換可能ですので，どちらを使っても構わないのですが，対応のあるt検定ではピアソンのrを使えないため，コーヘンのdを使うことが多いように思います．

効果量の大小は研究分野などによっても違いますが，一般的に言われている指標は以下のようなものです．

	small size	medium size	large size
コーヘンのd	0.2	0.5	0.8
ピアソンのr	0.1	0.3	0.5

対応のあるt検定の効果量

対応のあるt検定においてコーヘンのdは以下のように計算できます．

$d=\frac{|M|}{SD}$ .

Mは差の平均，SDは差の標準偏差です．

対応のないt検定の効果量

対応のないt検定におけるコーヘンのdは以下のように計算できます．

$d=\frac{|\mu_{1} - \mu_{2}|}{ \sqrt{\frac{(n_{1} - 1)s_{1}^{2} + (n_{2} - 1)s_{2}^{2}}{n_{1} + n_{2} - 2}}}$ .

本によっては分母をn_1 + n_2としているものもあり，どちらも正当性があるように思えます．RのMBESSパッケージでは上の式を使っているようです．対応のないt検定の場合にはピアソンのrを計算することもできます．

$r=\sqrt{\frac{t^{2}}{t^{2} + df}}$ .

tは検定結果のt値，dfは自由度です．

対応のあるt検定

対応のあるt検定は被験者内要因があるときに使います．対応のあるt検定がやっていることをおおざっぱに言うと，2つのグループの差を取って，その差の分布がt分布とどれくらい違っているかを計算しています．この「差を取る」という作業があるため，対応のあるt検定では2つのグループの母集団の分散が同じであるという仮定は必要としません．しかし，正規性は必要です．帰無仮説として2つのグループに差がない(つまり，差の平均が0である)こととして，検定を行います．

R code example

First, prepare the data.

value <- c(1,1,2,3,1,3,2,4,1,2,6,5,1,3,5,1,2,3,4,4) group <- c(0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1) data <- data.frame(group, value)

Then, run a paired t test.

t.test(data[data["group"]==0,2], data[data['group']==1,2], paired=T) Paired t-test data: data[data["group"] == 0, 2] and data[data["group"] == 1, 2] t = -1.7685, df = 9, p-value = 0.1108 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -3.1907752 0.3907752 sample estimates: mean of the differences -1.4

In this example, we do not have a significant effect of Group because p = 0.11. The results also show other information. For instance, the mean of the differences is 1.4 (the means of Group 1 and Group 2 are 2.0, and 3.4 respectively). It also shows 95 percent confidence interval. This is the 95% confidence interval of the mean of the differences. This means that under the criterion of p=0.05, the mean of the differences can be between -3.19 and 0.39. The null hypothesis is that the difference in means is equal to 0. Thus, if the zero is included in the 95% confidence interval of the mean of the differences, we cannot really reject the null hypothesis. This clearly corresponds with the p value we have.

You also need to calculate the effect size. We need to use Cohen's d for a paired t test (remember that we cannot use the calculation of Pearson's r mentioned above for a paired t test). First, we need to calculate the mean and variance for each group. As you can see in the results of the paired t test, the mean of differences is 1.4. So, you need to calculate the standard deviation of differences by hand.

> sd(data[data$group=="0",2] - data[data$group=="1",2]) 2.503331

Thus, Cohen's d is.

1.4 / 2.503331 0.5592548

If you need to report the 95% confidence interval for the effect size, you can use ci.sm() function in MBESS package.

library(MBESS) ci.sm(Mean=1.4, SD=2.503331, N=10, conf.level=0.95) [1] "The 0.95 confidence limits for the standardized mean are given as:" $Lower.Conf.Limit.Standardized.Mean [1] -0.1238246 $Standardized.Mean [1] 0.5592548 $Upper.Conf.Limit.Standardized.Mean [1] 1.216236

Thus, Cohen's d = 0.56 with CI = [-0.12, 1.21].

An unpaired t test

If you are going to use an unpaired t test, you need to consider another assumption, which is about the homogenity of variances: The variances of the population of the two groups are equal. This is important for an unpaired t test. However, we have a t test which can accommodate the unequal variances, which is called a Welch's t test. Unless you can make sure that the variances of the population of the two groups are equal, you can simply use a Welch's t test without thinking too much. This is fair because a Welch's t test is generally a more strict test than a standard t test (i.e., your p value with a Welch's t test becomes higher than one with a standard t test).

A t test has a hypothesis, called null hypothesis. The null hypothesis is there is no significant difference in the means between the two groups. If the p value is less than 0.05, you reject the null hypothesis, and say that you find a significant difference.

As you can see in the following example with R, the beauty of a t test is its simplicity. This is one reason why I like a t test. It is less likely to mess up an analysis compared to other kinds of statistical tests. I recommend you to think about your experiment so that you can do this nice test.

R code example

First, create data with two groups (0 and 1). Let's say Group 0 represents some performance of Technique 1, and Group 1 represents some performance of Technique 2.

value <- c(1,1,2,3,1,3,2,4,1,2,6,5,1,3,5,1,2,3,4,4) group <- c(0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1) data <- data.frame(group, value)

Then, run a Welch's t test (which does not assume the equal variances).

t.test(data[data["group"]==0,2], data[data['group']==1,2], var.equal=F)

You will get the results.

Welch Two Sample t-test data: data[data["group"] == 0, 2] and data[data["group"] == 1, 2] t = -2.2014, df = 14.963, p-value = 0.04382 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -2.75581022 -0.04418978 sample estimates: mean of x mean of y 2.0 3.4

The p value is 0.04 < 0.05, which means that there is a statistically significant difference. Let's see the results with a standard t test (which does assume the equal variances) for comparison.

t.test(data[data["group"]==0,2], data[data["group"]==1,2], var.equal=T) Two Sample t-test data: data[data["group"] == 0, 2] and data[data["group"] == 1, 2] t = -2.2014, df = 18, p-value = 0.04099 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -2.73610126 -0.06389874 sample estimates: mean of x mean of y 2.0 3.4

As you can see here, the p value with a standard t test (0.04099) is slightly smaller than one with a Welch's t test (0.04382). Another point you should look at is the degree of freedom: 18 in a standard t test and 14.963 in a Welch's t test. The intuition of a Welch's t test is to accommodate the unequal variances by adjusting the degree of freedom.

We also need to calculate the effect size. Here, we can use smd() function in MBESS package.

> library(MBESS) > abs(smd(data[data$group=="0",2], data[data$group=="1",2])) 0.9844952

Or you can calculate d manually. But you have to calculate the variance of each group first.

by(data$value, data$group, var) data$group: 0 [1] 1.111111 ----------------------- data$group: 1 [1] 2.933333 (3.4 - 2) / sqrt(((10 - 1) * 1.111111 + (10 - 1) *2.933333) / 18) 0.9844952

If you need the 95% confidence interval for the effect size, you can use ci.smd() function with the t value (2.2014 in this example).

ci.smd(ncp=2.2014, n.1=10, n.2=10, conf.level=0.95) $Lower.Conf.Limit.smd [1] 0.03961892 $smd [1] 0.984496 $Upper.Conf.Limit.smd [1] 1.905349

Thus, Cohen's d = 0.98 with CI = [0.04, 1.90].

How to report

You can report the results in your paper like:

With a Welch's t test, we found a significant effect for techniques (t(15) = 2.20, p < 0.05, Cohen's d=0.98) with Technique 2 outperforming Technique 1.

You can take the absolute of the t value. In this example, the mean of Group 1 is larger than the one of Group 0, and this is why we have a negative for the t value.

Table of Contents

t検定

背景

対応があるかないか

効果量

対応のあるt検定の効果量

対応のないt検定の効果量

対応のあるt検定

R code example

An unpaired t test

R code example

How to report