Processing math: 100%

1 Getting started

Make sure to load the ggplot2 library so we can do plotting.

library(ggplot2)

2 Hypothesis testing

This example can be found in slide 20/33 of Lecture 4.

From previous research we know that the average life span of a certain species of parrot in their native habitat is 28.0 years. We assume that the life span follows a normal distribution. A researcher studies a sample of 20 parrots in a new habitat and finds an average life span of 30.2 years with a standard deviation of 5.0 years. Can he say with 95% certainty that they live longer in the new habitat?

Given the problem statement, we start assigning variables:

n = 20       # sample size
s = 5.0      # standard deviation
x_bar = 30.2 # sample average
  1. We have two hypotheses:
mu_0 = 28
  1. Because we want 95% certainty, we use α=0.05 with a one sided test for the upper tail
  2. Find rejection region
# we use lower.tail=TRUE since we are looking for the 95% quantile, starting from the lower tail
# qt is the quantile function for the Student t Distribution
t_1_minus_alpha = qt(0.95, df=n-1, lower.tail=TRUE)
t_1_minus_alpha
## [1] 1.729133
t = (x_bar - mu_0)/(s/sqrt(n))
t
## [1] 1.96774
  1. Because t>t(1−α) (1.9677398 > 1.7291328) we reject H0, the null hypothesis.
  2. Calculate the p-value:
# pt is the cumulative distribution of the t distribution
p_value = 1 - pt(t, df=n-1)
p_value
## [1] 0.03193441

We can also calculate this by setting lower.tail=FALSE and not subtracting the value from one.

p_value = pt(t, df=n-1, lower.tail=FALSE)
p_value
## [1] 0.03193441

2.1 Exercise

Plot the t distribution and add vertical lines to indicate where your t-values are and where the rejection regions start.

Hint: to add lines with text, you can use the add statements such as the following when creating a plot:

+ geom_vline(xintercept = t, linetype="dashed") # add a vertical dashed line at x=t
+ annotate("label", x=t, y=0.3, label="put your text here") # add a label at the stated x and y coordinates

2.2 Exercise:

Now you should perform a 2-tailed test on your own. Can you conclude that the lifetimes are different?

We have two hypotheses:

  • H0:μ0=28
  • Ha:μa≠28

You should be able to create the following:

The value for t(1−α/2) should be:

t_1_minus_alpha_div_2
## [1] 2.093024

You should be able to create a plot like the following. You don’t have to create the filled red regions, you can just add the vertical lines.

3 t-tests

As mentioned in the lectures, we know that a t-test can be used:

3.1 Example

This example can be found in Lecture 5

Do marijuana smokers score less points on short term memory test? A study took two sets of people randomly selected from a population of smokers and non-smokers. Their scores are on the test are the following:

non_smoke = c(18,22,21,17,20,17,23,20,22,21)
smoke = c(16,20,14,21,20,18,13,15,17,21)

We now calculate the sample means for both the smoker and non-smoker scores:

x_bar1 = mean(non_smoke)
x_bar2 = mean(smoke)
print(x_bar1)
## [1] 20.1
print(x_bar2)
## [1] 17.5

and do the same for the standard deviations:

s1 = sd(non_smoke)
s2 = sd(smoke)
print(s1)
## [1] 2.13177
print(s2)
## [1] 2.953341

What we see from this is that both the means and the standard deviations are different. Already we see that the average score for the non smokers is higher than that for the smokers.

We can use a Q-Q plot to examine if the data for the smokers is normally distributed:

qqnorm(smoke)
qqline(smoke)

We do the same now to see if the data for the non-smokers is normally distributed:

qqnorm(non_smoke)
qqline(non_smoke)

3.1.1 One sample t-test

3.1.1.1 t-test with only smokers

The syntax of the t-test is t.test, type in the RStudio console ?t.test to see all options. In the help window, you should see something like this:

t.test(x, y = NULL,
       alternative = c("two.sided", "less", "greater"),
       mu = 0, paired = FALSE, var.equal = FALSE,
       conf.level = 0.95)

Whenever you see something like alternative = c("two.sided", "less", "greater") that means that you should choose one of the values and the correct syntax for the function would be something like alternative = "two.sided".

If you try something like alternative = c("two.sided", "less") it won’t work and you’ll see an error like Error in match.arg(alternative) : 'arg' must be of length 1

One question that we can ask with the t.test is if the marijuana smokers score less than the national average. For this t-test, we examine the smokers only. We know that in the general population people on average score 20 on the test. For our values we use:

  • x is smokers
  • our alternative is "less"
  • the null hypothesis is that mu=20.
  • We leave out these other variables like paired and var.equal

Our null hypothesis is that smokers also score 20 on the test. Our alternative hypothesis is that they score less.

Now we need to do a t-test to see if we can reject the null hypothesis that the smokers also score 20 (or even more)

res = t.test(smoke, alternative = "less", mu=20, conf.level = 0.95)
res
## 
##  One Sample t-test
## 
## data:  smoke
## t = -2.6769, df = 9, p-value = 0.01267
## alternative hypothesis: true mean is less than 20
## 95 percent confidence interval:
##    -Inf 19.212
## sample estimates:
## mean of x 
##      17.5

We can get the value for the t-statistic like this:

res$statistic
##         t 
## -2.676865

You can see which other variables are available by typing res$ and then typing on the Tab key. This will bring up a menu showing what is available.

Now try on your own a one-sided test if you choose the alternative is greater, even though your sample average is smaller than your hypothesis average. You should see the following:

## 
##  One Sample t-test
## 
## data:  smoke
## t = -2.6769, df = 9, p-value = 0.9873
## alternative hypothesis: true mean is greater than 20
## 95 percent confidence interval:
##  15.788    Inf
## sample estimates:
## mean of x 
##      17.5

The p-value is 0.9873 since you confused the sidedness of the t-test. The sample average is actually less than the hypothesized mean. This calculates the p-value of the upper side.

Note the negative t value. The alternative hypothesis is that the true mean is greater than 20, therefore R calculates the p-value as the probability of t > -2.6769, and this probability is large!

Now plot the t-distribution with alpha = 0.05 and degrees of freedom is 9 (since n-1). For the upper tail t-test, your t value is negative something since it’s on the left side of the distribution. The upper tail t-test calculates the probability of greater than this t-value

x = seq(-3,3,0.01)
n = 10
dist = data.frame(x=x, 
                  y = dt(x, df=n-1)) # y is equal to the t-test distribution sampled at values of x

t_1_minus_alpha = qt(1 - (0.05), df=n-1, lower.tail=TRUE)

right_tail_x = seq(t_1_minus_alpha, 3, 0.01)

ggplot(dist, aes(x,y)) + 
  geom_line() + 
  geom_vline(xintercept = res$statistic, linetype="dashed") + 
  annotate("label", x = res$statistic, y=0.25, label="t") + 
  geom_vline(xintercept = t_1_minus_alpha, linetype="dashed") + 
  annotate("label", x = t_1_minus_alpha, y=0.25, label="t[1-alpha]", parse=TRUE) + 
  geom_polygon(data = data.frame(x = c(right_tail_x, rev(right_tail_x)), 
                                 y = c(dt(right_tail_x, df=n-1), 
                                       rep(0, length(right_tail_x)))), 
               fill="red", alpha=0.5)

If you have a sample mean that is lower than your hypothesized mean, you should never do the upper tail test, only do the lower-tail or two-sided test. The reverse is true if you have a sample mean that is larger than your hypothesis. Therefore, the two-sided test is a good choice since you can’t go wrong in this way, but it’s not as powerful as the one-sided test. As you saw in the previous example, with the one-sided test you got a significant result, but could not reject the null hypothesis with the two-sided test.

3.1.2 Two sided t-test

Now on your own, try to do a two-sided test using t.test. You should see the following:

## 
##  One Sample t-test
## 
## data:  smoke
## t = -2.6769, df = 9, p-value = 0.02534
## alternative hypothesis: true mean is not equal to 20
## 95 percent confidence interval:
##  15.38731 19.61269
## sample estimates:
## mean of x 
##      17.5

3.1.2.1 Two sample t-test - equal or unequal variances

We will cover this in a later lecture, but now we give you the commands that you can use. Instead of comparing this set of smokers to the general population, we want to do a more detailed test (see Lecture 5, section 2.1 & 2.2).

The null hypothesis goes for the difference in the means, so mu=0 means no difference in the means. We will still test if alternative is greater, lesser or two-sided. Setting alternative to greater/lesser corresponds to difference in the mean.

This test then calculates the difference in the means = mean of first sample (ˉx1) minus the mean of the second sample (ˉx2).

You need to pay attention to order of x and y, and what alternative you choose.

In this example we are dealing with equal or unequal variances, which means that the parameter var.equal is now important when using the t.test function.

var.equal specifies if both samples have the same variance. You should only use this if you think they come from distributions with the same variance. If you don’t know, then it’s better to fill it in as FALSE. If they are the same, then your test is a bit less efficient, but if the variances are different and you fill in TRUE then you will get not good results.

(students try to put this into a t-test - see what happens with equal and non-equal variances - see what happens to the p-value so see that they’re slightly different)

For this, we do not need to specify a value of mu, but you need to specify the second set of data values (i.e. y).

Again, note the syntax for the t.test function:

t.test(x, y = NULL,
       alternative = c("two.sided", "less", "greater"),
       mu = 0, paired = FALSE, var.equal = FALSE,
       conf.level = 0.95)

With var.equal=TRUE you should get:

t.test(smoke, non_smoke, alternative = "two.sided", conf.level = 0.95, var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  smoke and non_smoke
## t = -2.2573, df = 18, p-value = 0.03665
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -5.0198634 -0.1801366
## sample estimates:
## mean of x mean of y 
##      17.5      20.1

On your own, try with var.equal=FALSE and alternative = "two.sided" you should get:

## 
##  Welch Two Sample t-test
## 
## data:  smoke and non_smoke
## t = -2.2573, df = 16.376, p-value = 0.03798
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -5.0371795 -0.1628205
## sample estimates:
## mean of x mean of y 
##      17.5      20.1

Let’s try a one-sided test, and based on the means, we can choose which alternatives to take:

## [1] 17.5
## [1] 20.1

Here we try the one-sided tests, one time with equal variance, and one time with unequal variance:

t.test(non_smoke, smoke, alternative = "greater", var.equal = FALSE, conf.level = 0.95)
## 
##  Welch Two Sample t-test
## 
## data:  non_smoke and smoke
## t = 2.2573, df = 16.376, p-value = 0.01899
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  0.5919017       Inf
## sample estimates:
## mean of x mean of y 
##      20.1      17.5
t.test(non_smoke, smoke, alternative = "greater", var.equal = TRUE, conf.level = 0.95)
## 
##  Two Sample t-test
## 
## data:  non_smoke and smoke
## t = 2.2573, df = 18, p-value = 0.01833
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  0.6026879       Inf
## sample estimates:
## mean of x mean of y 
##      20.1      17.5

3.1.2.2 two sample t-test - paired

In this example, the paired parameter for the t.test function is important.

The previous test with the smokers maybe necessarily good as our test could be messed up by chance. What people often do is have the same people in the test - first people take test before they smoke, and then take the test again after they smoked. If we assume that these values are measured on the same individual, then we have to choose these paired alternatives (this will be explained more in Lecture 5, section 2.3)

# test values both before and after smoking:
pre_test = c(77, 56, 64, 60, 57, 53, 72, 62, 65, 66)
post_test = c(88, 74, 83, 68, 58, 50, 67, 64 ,74 ,60)

t.test(pre_test, post_test, alternative = "less", var.equal = FALSE, paired = TRUE, conf.level = 0.95)
## 
##  Paired t-test
## 
## data:  pre_test and post_test
## t = -1.8904, df = 9, p-value = 0.04564
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##        -Inf -0.1635821
## sample estimates:
## mean of the differences 
##                    -5.4
t.test(pre_test, post_test, alternative = "less", var.equal = TRUE, conf.level = 0.95)
## 
##  Two Sample t-test
## 
## data:  pre_test and post_test
## t = -1.2484, df = 18, p-value = 0.1139
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##      -Inf 2.100925
## sample estimates:
## mean of x mean of y 
##      63.2      68.6

3.2 Power of the test

You can perform power calculations for t tests via the power.t.test function. If you type ?power.t.test into the console you should see the following usage documentation:

power.t.test(n = NULL, delta = NULL, sd = 1, sig.level = 0.05,
             power = NULL,
             type = c("two.sample", "one.sample", "paired"),
             alternative = c("two.sided", "one.sided"),
             strict = FALSE, tol = .Machine$double.eps^0.25)

To explain the parameters:

  • delta parameter is the true difference between the null hypothesis and the alternative hypothesis.
  • n is the sample size
  • sd is the estimated standard deviation of the population
  • sig.level is the significance level that you choose, e.g. 0.05, (Type I error probability)
  • power is the power of test (1 - Type II error probability)
  • alternative - two.sided or one.sided
  • type we will only do this for the one.sample

We can now explore the different alternative hypotheses using this function. Make sure to always fill in the sig.level and sd. For n, delta and power, you have to fill in two of them and the test will calculate the remaining one. If you know sample size (n), and the sig.level, it will give you the power of your test. If know the power you want, then this will give you the sample size to achieve this power.

3.3 Exercise

From previous research we know that the average life span of a certain species of parrot in their native habitat is 28.0 years, with a standard deviation of 5 years.

A researcher studies a sample of 20 parrots in a new habitat. Assuming the standard deviation does not change, what is the chance that we can detect an increase in lifetime of 2 years compared to the old habitat at the 95% confidence level?

On your own, explore different hypotheses where the alternative mean is from 29 to 33, then plot the power of the test as a function as each alternative hypothesis.

You should see:

Now, assuming that mu_a is 30, we can show the power as a function of sample size:

xa = 30
diff = xa - x0
n = c(2:100)
res = power.t.test(delta = diff, sd = sigma, sig.level = alpha, n = n, 
                   type = "one.sample", alt = "one.sided")

beta1 = 1 - res$power

# create a data frame showing for different values of xa the power of the t test
df = data.frame(n, power = res$power)

ggplot(df, aes(x=n, y=power)) + 
  geom_point() + 
  ggtitle("Power vs. Sample Size")

3.4 Exercise

Now assuming a power of 0.9 and mu_a = 30, then find the sample size to achieve this power of the test.

You should get:

## 
##      One-sample t test power calculation 
## 
##               n = 54.90553
##           delta = 2
##              sd = 5
##       sig.level = 0.05
##           power = 0.9
##     alternative = one.sided