Hypothesis Testing

Getting started

Make sure to load the tidyverse library so we can do plotting later on.

library(tidyverse)

Example

This example can be found in slide 27 of Lecture 4.

  • From previous research we know that the average life span of a certain species of parrot in their native habitat is 28.0 years.
  • We assume that the life span follows a normal distribution.
  • A researcher studies a sample of 20 parrots in a new habitat and finds an average life span of 30.2 years with a standard deviation of 5.0 years.
  • Can he say with 95% certainty that they live longer in the new habitat?

Given the problem statement, we start assigning variables:

n = 20       # sample size
s = 5.0      # standard deviation
x_bar = 30.2 # sample average

We have two hypotheses:

  • \(H_{0}: \mu_{0} = 28\), parrots have the same average lifespan as in their native habitat
  • \(H_{a}: \mu_{a} > 28\), parrots have a longer average lifespan than in their native habitat
mu_0 = 28

Because we want 95% certainty, we use \(\alpha = 0.05\) with a one sided test for the upper tail

Now we find the rejection region:

  • Statistic to use: t-statistics (sample < 30, from a normal distribution)
  • Calculate \(t_{(1-\alpha)}\) from table : \(t_{(1-\alpha)}\) = 1.73 or via R:
# we use lower.tail=TRUE since we are looking for the 95% quantile, starting from the lower tail
# qt is the quantile function for the Student t Distribution
t_1_minus_alpha = qt(0.95, df=n-1, lower.tail=TRUE)
t_1_minus_alpha
## [1] 1.729133

Visualizing the rejection region

To visualize this, we can plot the t distribution and add a line indicating the location of \(t_{(1-\alpha)}\).

First we create a data frame, where x contains a sequence of values along the x axis, and y contains the probability density of the t-test distribution.

x = seq(-3,3,0.01)
y = dt(x,df=n-1)    # y is equal to the t-test distribution sampled at values of x

# create data frame to hold x and y values
dist = data.frame(x = x, 
                  y = y) 

We want to make a plot that contains three different elements:

  • the t-test distribution (calculated above using the dt() function)
  • a vertical line at the value of t_1_minus_alpha
  • a text label showing \(t_{1-\alpha}\) at the value of t_1_minus_alpha

Below we’ll show how to make the plot step by step. Each of the examples below builds on the previous examples, where we add one of the elements in each step.

First plot the distribution:

ggplot(dist, aes(x,y)) + 
  geom_line()

Same plot as above, but add a vertical dashed line (geom_vline) at the location of t_1_minus_alpha

ggplot(dist, aes(x,y)) + 
  geom_line() +             # this plots the distribution as a line
  geom_vline(xintercept = t_1_minus_alpha,   # plot a vertical line at t_1_minus_alpha
             linetype="dashed")              # make this a dashed line

Same plot as the previous one, but now add a label at the x value of t_1_minus_alpha, with a y value of 0.35. You can adjust the y value to move the label up and down on the plot.

ggplot(dist, aes(x,y)) + 
  geom_line() +             # this plots the distribution
  geom_vline(xintercept = t_1_minus_alpha,   # plot a vertical line at t_1_minus_alpha
             linetype="dashed") +            # make this a dashed line
    annotate(geom = "label",          # add a label for t_1_minus_alpha.  
                                      # Don't change geom = "label", it needs to be exactly this
             x = t_1_minus_alpha,     # x position for the label
             y = 0.35,                # y position for the label
             label = "t[(1-alpha)]",    # this is the text that will be displayed
             parse = TRUE)

Calculate the t-statistic:

Use the formula \(t = \frac{\bar{x} - \mu_{0}}{s/\sqrt{n}}\)

t = (x_bar - mu_0)/(s/sqrt(n))
t
## [1] 1.96774

Below we plot what this looks like. Note that the plotting code is the same as above, except that at the very end we add one more geom_vline statement along with another annotate statement so that we can visualize where the value of t is located.

ggplot(dist, aes(x,y)) + 
  # this plots the distribution
  geom_line() +                     
  # plot a vertical line at t_1_minus_alpha
  geom_vline(xintercept = t_1_minus_alpha,   
             linetype="dashed") +            # make this a dashed line
  # add a label for t_1_minus_alpha.  
  annotate(geom = "label",          # Don't change geom = "label", it needs to be exactly this
           x = t_1_minus_alpha,     # x position for the label
           y = 0.35,                # y position for the label
           label = "t[(1-alpha)]",    # this is the text that will be displayed
           parse = TRUE) +          # use parse = TRUE if you're displaying symbols like alpha
  # plot a vertical line at t
  geom_vline(xintercept = t,        
             linetype="dashed") +   # make this a dashed line
  # add a label for the value of t
  annotate(geom = "label",          
           x = t,
           y = 0.35,                
           label = "t",    
           parse = FALSE)           # can set parse = FALSE, just showing a "t", not symbols like alpha

Because \(t > t_{(1-\alpha)}\) (1.97 > 1.73), we reject \(H_{0}\), the null hypothesis.

Calculate the p-value:

# pt is the cumulative distribution of the t distribution
p_value = 1 - pt(t, df=n-1)
p_value
## [1] 0.03193441

We can also calculate this by setting lower.tail=FALSE and not subtracting the value from one.

p_value = pt(t, df=n-1, lower.tail=FALSE)
p_value
## [1] 0.03193441

If we visualize the cumulative distribution (starting from the right side), we can see that p_value is the value of the cumulative distribution at the value of t

Exercise

Now you should perform a 2-tailed test on your own. Can you conclude that the lifetimes are different?

We have two hypotheses:

  • \(H_{0}: \mu_{0} = 28\)
  • \(H_{a}: \mu_{a} \neq 28\)

The value you find for \(t_{(1-\alpha/2)}\) should be:

## [1] 2.093024

Create a plot to visualize the values for the t-statistic and the confidence interval.

Note that with the one-sided test we were able to reject the null hypothesis, but with the two sided test, you are not able to reject it.

One sample t-test

Example

This example can be found in Lecture 4 and will be used throughout this practical.

Do marijuana smokers score less points on short term memory test? A study took two sets of people randomly selected from a population of smokers and non-smokers. Their scores are on the test are the following:

non_smoke = c(18,22,21,17,20,17,23,20,22,21)
smoke = c(16,20,14,21,20,18,13,15,17,21)

Syntax for the t.test in R

In order to analyze this data, we will use the t.test function. To find the documentation for the correct syntax, we type in the RStudio console ?t.test to see all options. In the help window, you should see something like this:

t.test(x, y = NULL,
       alternative = c("two.sided", "less", "greater"),
       mu = 0, paired = FALSE, var.equal = FALSE,
       conf.level = 0.95)

Setting the alternative argument

  • Whenever you see something like alternative = c("two.sided", "less", "greater") that means that you should choose only one of the values
    • The correct syntax for the function would be something like alternative = "two.sided".
  • If you try something like alternative = c("two.sided", "less") it won’t work.
    • You’ll get an error like Error in match.arg(alternative) : 'arg' must be of length 1

Example problem for t.test

One question that we can ask with the t.test is if the marijuana smokers score less than the national average. For this t-test, we examine the smokers only. We know that in the general population people on average score 20 on the test. For our values we use:

  • x is smokers
  • our alternative is one of the three options "two.sided", "less" or "greater"
  • the null hypothesis is that mu=20.
  • We leave out these other variables like paired and var.equal because they belong to the two sample test. We leave out y since that refers to a possible second sample.

The first thing we will try is a two sided t-test.

Two sided t-test

We will now do a two sided t-test for one sample.

  • \(H_0: \mu = 20\) - our null hypothesis is that smokers also score 20 on the test.
  • \(H_a: \mu \neq 20\) - our alternative hypothesis is that they score differently.

Here we do a t-test to see if we can reject the null hypothesis, \(H_0\), that the smokers also score 20

Using the t.test() function, we do a two-sided test as follows:

res = t.test(smoke, alternative = "two.sided", mu=20, conf.level = 0.95)
print(res)
## 
##  One Sample t-test
## 
## data:  smoke
## t = -2.6769, df = 9, p-value = 0.02534
## alternative hypothesis: true mean is not equal to 20
## 95 percent confidence interval:
##  15.38731 19.61269
## sample estimates:
## mean of x 
##      17.5

As you can see, if we print out the results of the t-test, it shows a lot of information. If we want to simply get the value for the t-statistic, we can just use res$statistic:

res$statistic
##         t 
## -2.676865

You can see which other variables are available by typing res$ and then typing on the Tab key. This will bring up a menu showing what is available.

Visualizing the two-sided t-test results

The results we just found can be visualized in the plot below. As you can see, to create this plot we do three things:

  • Plot a range of values in the t-distribution (where values in the distribution are determined using the function dt())
  • Mark the 95% confidence interval (the region corresponding to \(\alpha=0.05\))
  • Mark the value for t (res$statistic) from above
n = length(smoke)    # number of samples

x = seq(-4, 4, 0.1) # sequence from -4 to 4 in increments of 0.1
y = dt(x, df=n-1)   # probability density of the t distribution

df = data.frame(x=x,
                y=y)

alpha = 0.05
t_alpha_div_2 = qt(alpha/2, df = n-1)
t_1_minus_alpha_div_2 = qt(1 - alpha/2, df = n-1)
t = res$statistic

ggplot(df, aes(x,y)) + geom_line() + 
  # vertical line and label for t_alpha_div_2
  geom_vline(xintercept = t_alpha_div_2, linetype = "dashed") + 
  annotate("label", x = t_alpha_div_2, y=0.25, label="t[(alpha/2)]", parse=TRUE) + 
  # vertical line and label for t_1_minus_alpha_div_2
  geom_vline(xintercept = t_1_minus_alpha_div_2, linetype = "dashed") + 
  annotate("label", x = t_1_minus_alpha_div_2, y=0.25, label = "t[(1 - alpha/2)]", parse=TRUE) + 
  # vertical line and label for t (from the t.test function called above)
  geom_vline(xintercept = t, linetype = "dashed") + 
  annotate("label", x = t, y=0.25, label = "t", parse=FALSE)

Exercise 1

Try a two-sided test for a value of mu = 17 (you should see the results below). Also try with different values of mu and see how that changes the results.

## 
##  One Sample t-test
## 
## data:  smoke
## t = 0.53537, df = 9, p-value = 0.6054
## alternative hypothesis: true mean is not equal to 17
## 95 percent confidence interval:
##  15.38731 19.61269
## sample estimates:
## mean of x 
##      17.5

Exercise 2

If instead of typing the command res = t.test(smoke, alternative = "two.sided", mu=17, conf.level = 0.95), you just typed res = t.test(smoke), you would get the results below, which doesn’t seem to be correct.

What are the default values that are being used by res = t.test(smoke), according to the documentation at ?t.test?

## 
##  One Sample t-test
## 
## data:  smoke
## t = 18.738, df = 9, p-value = 1.612e-08
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  15.38731 19.61269
## sample estimates:
## mean of x 
##      17.5

One sided t-test

When performing a one-sided t-test, we need to figure out which value for alternative that we need to use. Which one we choose will be based on the values for \(\mu\) and the sample mean.

  • if sample mean > \(\mu\), then use alternative = greater
  • if sample mean < \(\mu\), then use alternative = less

To review, our sample mean in this case is

mean(smoke)
## [1] 17.5

As an example, let’s try a t-test with a value for \(\mu\) of 20, where alternative = "greater". Note that the value we choose for \(\mu\) is greater than our sample mean.

t.test(smoke, alternative = "greater", mu=20, conf.level = 0.95)
## 
##  One Sample t-test
## 
## data:  smoke
## t = -2.6769, df = 9, p-value = 0.9873
## alternative hypothesis: true mean is greater than 20
## 95 percent confidence interval:
##  15.788    Inf
## sample estimates:
## mean of x 
##      17.5

What this is telling us is that these are the hypotheses being tested:

  • \(H_0: \mu = 20\) (since we specify mu=20 in t.test())
  • \(H_a: \mu > 20\), alternative hypothesis: true mean is greater than 20

You’ll note that we have a negative t value (t = -2.6769). The alternative hypothesis is that the true mean is greater than 20, therefore R calculates the p-value as the probability of t > -2.6769, and this probability is large!

We also see that p-value = 0.9873, because we picked the wrong the sidedness of the t-test. In other words, we can’t reject the null hypothesis \(H_0\) that \(\mu\) is 20 and it looks very unlikely that \(\mu\) > 20.

However, the sample average is actually less than the hypothesized mean (\(\mu\)). We’ve calculated the p-value of the upper side, when what we want is actually the p-value of the lower side.

To reiterate, if you have a sample mean that is lower than your hypothesized mean, you should never do the upper tail test, only do the lower-tail or two-sided test. The reverse is true if you have a sample mean that is larger than your hypothesis. Therefore, the two-sided test is a good choice since you can’t go wrong in this way, but it’s not as powerful as the one-sided test. As you saw in the previous example (with the parrots), with the one-sided test you got a significant result, but could not reject the null hypothesis with the two-sided test.

Exercise

Do the one-sided t-test shown above, except choose the correct option for alternative.

## 
##  One Sample t-test
## 
## data:  smoke
## t = -2.6769, df = 9, p-value = 0.9873
## alternative hypothesis: true mean is greater than 20
## 95 percent confidence interval:
##  15.788    Inf
## sample estimates:
## mean of x 
##      17.5

Additionally, try different values for mu (pick values above and below the sample mean) and alternative to see how these change the results.

Two sample t-test

Reminder

As mentioned in the lectures, we know that a t-test can be used in these particular cases:

  • If the sample size >~ 30 (distribution does not matter)
  • If the sample size <~ 30, only if distribution is normal

mean and sd for smoke and non_smoke

Further below we’ll do t-tests that examine the means and the variances of the populations represented by smoke and non_smoke.

As a first step, we calculate the sample means for both the smoke (\(\bar{x}_1\)) and non_smoke scores (\(\bar{x}_2\)):

x_bar1 = mean(smoke)
x_bar2 = mean(non_smoke)
print(x_bar1)
## [1] 17.5
print(x_bar2)
## [1] 20.1

and do the same for the standard deviations:

s1 = sd(smoke)
s2 = sd(non_smoke)
print(s1)
## [1] 2.953341
print(s2)
## [1] 2.13177

What we see from this is that both the means and the standard deviations are different. Already we see that the average score for the non smokers is higher than that for the smokers.

Two sample t-test - equal or unequal variances

The approach shown here is described in more detail in Lecture 5, Sections 3.1 & 3.2.

Now we want to do a more detailed test where we compare the scores of the smokers to the scores of the non-smokers.

For this t-test, the difference in the means is calculated by (\(\bar{x}_1 - \bar{x}_2\)), or in other words, the mean of first sample minus the mean of the second sample.

Our null hypothesis will be that there is no difference in the means:

  • \(H_0: \mu_1 - \mu_2 = 0\)

Using different values for alternative will give you the following alternative hypotheses. Make sure to pay attention to the values for \(\bar{x}_1\) and \(\bar{x}_2\) when picking which alternative to use.

  • \(H_a: \mu_1 - \mu_2 > 0\) alternative = "greater"
    • use this if \(\bar{x}_1\) > \(\bar{x}_2\)
  • \(H_a: \mu_1 - \mu_2 < 0\) alternative = "less"
    • use this if \(\bar{x}_1\) < \(\bar{x}_2\)
  • \(H_a: \mu_1 - \mu_2 \neq 0\) alternative = "two.sided"

Again, note the syntax for the t.test function:

t.test(x, y = NULL,
       alternative = c("two.sided", "less", "greater"),
       mu = 0, paired = FALSE, var.equal = FALSE,
       conf.level = 0.95)

Setting var.equal

Here we are showing how to analyze samples that may have equal or unequal variances. To specify which of these situations is the case, we need to use the var.equal parameter for the t.test function.

The main advice is:

  • Use var.equal = TRUE if both samples have the same variance.
    • Only use this if you think they come from distributions with the same variance.
  • If you don’t know if the samples actual have the same variance, then it’s better to use var.equal = FALSE.
    • If the variances are actually the same and you use var.equal = FALSE, then your test is a bit less efficient
    • If the variances are actually different and you use var.equal = TRUE then you will get not good results.

Example

For this example, we do not need to specify a value of mu, but you need to specify the second set of data values (i.e. a value for y as specified in the syntax above). Making sure to set var.equal=TRUE we get:

t.test(smoke, non_smoke, alternative = "two.sided", conf.level = 0.95, var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  smoke and non_smoke
## t = -2.2573, df = 18, p-value = 0.03665
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -5.0198634 -0.1801366
## sample estimates:
## mean of x mean of y 
##      17.5      20.1

Exercise 1 - non-equal variances

Try a two-sided test for the same data, but with the assumption of non-equal variances. Compare the p-value to that of the same t-test which assumes equal variances.

## 
##  Welch Two Sample t-test
## 
## data:  smoke and non_smoke
## t = -2.2573, df = 16.376, p-value = 0.03798
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -5.0371795 -0.1628205
## sample estimates:
## mean of x mean of y 
##      17.5      20.1

Exercise 2 - one-sided test

Now try a one-sided test, paying attention to which value for alternative you should use.

## 
##  Welch Two Sample t-test
## 
## data:  non_smoke and smoke
## t = 2.2573, df = 16.376, p-value = 0.01899
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  0.5919017       Inf
## sample estimates:
## mean of x mean of y 
##      20.1      17.5

Exercise 3 - two sample t-test - paired

Lecture 4, Slide 44 covers in more detail the approach shown here.

The previous unpaired test with the smokers may not give correct results since the smokers and non-smokers are completely different sets of people, and there’s a danger that there may be other variables that would cause their scores to differ, aside from the fact that one group smokes and the other doesn’t. A better approach would be to examine everyone’s scores before smoking, and then the scores for those same exact people afterwards.

If we have data about people’s scores before and after smoking, then we can use a paired t-test. To do this, we will have to specify paired = TRUE within the t.test function. (Note that paired = FALSE by default, and that this is the option that will be used if you do not specify anything.)

Here we have the test values for individuals both before and after smoking. Note that the locations in the vectors correspond to specific individuals.

# score_person_1_before, score_person_2_before, ...
pre_test = c(77, 56, 64, 60, 57, 53, 72, 62, 65, 66) 

# score_person_1_after, score_person_2_after, ...
post_test = c(88, 74, 83, 68, 58, 50, 67, 64 ,74 ,60) 

Try a paired t-test on these values, and then a t-test that is not paired. Note how this gives you different results. Make sure to consider which values you should use for alternative and var.equal, based on pre_test and post_test.

## 
##  Paired t-test
## 
## data:  pre_test and post_test
## t = -1.8904, df = 9, p-value = 0.04564
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##        -Inf -0.1635821
## sample estimates:
## mean of the differences 
##                    -5.4
## 
##  Welch Two Sample t-test
## 
## data:  pre_test and post_test
## t = -1.2484, df = 15.265, p-value = 0.1154
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##      -Inf 2.174418
## sample estimates:
## mean of x mean of y 
##      63.2      68.6

The test for equality in variance

Review from lecture

A test of \(s\) in two different populations with size \(n_1\), \(n_2\) involves the following:

  1. A confidence level 1-\(\alpha\)
  2. Two independent, random samples of normal distributions
  3. Null hypothesis, \(H_0\): \(s_1 \le s_2\), or \(s_1 \ge s_2\), or \(s_1 = s_2\)
  4. Alternative hypothesis, \(H_a\):
    • \(s_1 \gt s_2\) (upper tail alternative)
    • \(s_1 \lt s_2\) (lower tail alternative)
    • \(s_1 \ne s_2\) (two tailed)
  5. Test statistic: \(F = \frac{{s_{1}}^2}{{s_{2}}^2}\)

Syntax of var.test

var.test(x, y, ratio = 1, 
         alternative = c("two.sided", "less", "greater"),
         conf.level = 0.95, ...)

Relevant arguments for this practical:

  • x, y - numeric vectors of data values
  • ratio - the hypothesized ratio of the population variances of x and y.
  • alternative - a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less". You can specify just the initial letter.
  • conf.level - confidence level for the returned confidence interval.

Example

Take 100 samples from two normal distributions with different mean, but the same variance. Will the test recognize that we should not reject the null hypothesis?

x <- rnorm(100, mean=0)  # if you don’t specify the variance it is set to 1 by default
y <- rnorm(100, mean=1)

F test to compare two variances:

var.test(x, y, ratio = 1, alternative = "two.sided", conf.level = 0.95)      
## 
##  F test to compare two variances
## 
## data:  x and y
## F = 0.95579, num df = 99, denom df = 99, p-value = 0.8225
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.6430972 1.4205314
## sample estimates:
## ratio of variances 
##          0.9557927

Since our p-value is 0.8225, we can’t reject the null hypothesis that the ratio of the variances is equal to one.

Exercise

Test if variance between smokers and non-smokers is the same

## 
##  F test to compare two variances
## 
## data:  smoke and non_smoke
## F = 1.9193, num df = 9, denom df = 9, p-value = 0.3456
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.4767308 7.7271526
## sample estimates:
## ratio of variances 
##           1.919315