Hypothesis Testing
One sample t-test
Independent measures t-test
Repeated measures t-test
Extra
100

What is the critical region for a z-score hypothesis test?

+/- 1.96

100

Why do we use a t-test instead of a z-score?

  • The t statistic is an alternative to a z-test

    • Allows researchers to use sample data to test hypotheses about the difference between a sample mean and population mean

    • The t statistic does not require knowledge of the population standard deviation (σ)

  • You can use a t-test to test hypotheses for a completely unknown population

    • A completely unknown population happens when: both μ and σ are unknown

    • In these cases, our only available information comes from the sample.

For a t-test, all that is required is a sample and a reasonable hypothesis about μ

100

Why do we use a independent measures t-test instead of a one-sample t-test?

Used in situations where a researcher has no prior knowledge about either of the two populations (or treatments) being compared.

• Both population means and standard deviations are unknown (which is fairly common).

• The values must be estimated from the sample data 

100

What is the value of hypothesis population difference (uD) in the t-test equation?

μD = 0

100

Effect size for t-tests

Effect size - measure of the absolute magnitude of an effect, independent of sample size.

  • Hypothesis tests should be accompanied by effect size

    • Cohen’s d is a standardized effect size.

    • Like a z-test, Cohen’s d measures mean difference in terms of the standard deviation.

    • Cohen’s d = (M - μ)/σ

    • r² = Percentage of Variance Accounted for by the IV
      • Scores differ across individuals for many reasons.

      – By measuring the amount of variability that can be attributed to the IV, we obtain a new measure of effect size.

    • r² = t² / (t² + df)

200

What is Type 1 error? Type 2 error?

Type I Errors (False Positives): Occurs when the sample data indicate an effect when no effect actually exists.

  • Rejecting the null hypothesis when the null is true.

Type II Errors (False Negatives): Occur when the hypothesis test does not indicate an effect but in reality an effect does exist.

  • We fail to reject the null hypothesis even though it was actually false.
200

What is a directional test (one-tailed test)? Non-directional test?

A directional test (or one-tailed test) includes a directional prediction in the statement of the hypothesis and in the locations of the critical region

      - Either positive or negative

A non-directional test includes a prediction in the statement of the hypothesis and in the locations of the critical region

      - Both positive and negative

200

(True or False) Whether you use Cohen's d or r^2, you'll always get the same effect size.

False: you can get a small Cohen's d effect size but a medium r^2 effect size

200

Why is a repeated measures t-test used instead of the other t-tests?

Evaluates the mean difference between two measurements taken from a single sample

200

Which type error are we more accepting of? Why?

Type I Error because it's better to be overcautious then under

300

What is a hypothesis test?

A statistical method that uses sample data to evaluate a hypothesis about a population

300

What are the steps in conducting a t-test?

Step #1: State the hypotheses and select a value of α. (Note: The null always states a value for μ.)

  • The null hypothesis, H0, predicts that the independent variable had no effect on the dependent variable.

  • The alternative hypothesis, H1, predicts that the independent variable did have an effect on the dependent variable.

Step #2: Locate the critical region.

  • The α level establishes a criterion, or "cut-off", for deciding if the null hypothesis is correct.

    • Typically α = .05 (rarely α = .10 or α = .01)

  • Critical region consists of outcomes very unlikely to occur if the null hypothesis is true

    • Defined by associations that are very unlikely to obtain (typically less than 5% chance) if no effect exists.

Step #3: Compute the relevant test statistic.

Step #4: Make a decision.

  • If the test statistic results are in the critical region, we conclude the difference is significant (an effect exists).

    • We reject the null hypothesis.

  • If the test statistic is not in the critical region, conclude that the difference is not significant (any difference is just due to chance).

    • We fail to reject the null hypothesis.
300

What does an independent measures t-test do? What is considered or used in this t-test?

Independent measures t-test: Allows evaluation of the mean difference between two unknown populations using data from two samples.

• Independent-measures designs uses two separate and independent samples.

• Use #1: Test for mean differences between two distinct populations (those with college degrees and those without).

• Use #2: Test for mean differences between two different conditions (Acceptance and Commitment Therapy vs. placebo).

300

What are the strengths of a repeated measures t-test?

Repeated-measures designs require fewer participants than needed for an independent-measures design.
 • Individual differences in performance from one participant to another are eliminated.
• Reduces the variance between subjects → reduces the estimated standard error → increases power
• Repeated-measures designs are particularly well suited for examining changes that occur over time
      • For example, learning or development.

300

How much of a chance are we making a mistake if we reject the null hypothesis when there is no effect (Type I Error)?

20%-50% (around 35%)

400

Statistical power (Definition and how it's determined)

  • The power of a hypothesis test is the probability that the test will reject the null hypothesis when there is actually an effect.

    • Likelihood we can find what we are looking for depends on:

      • Effect size (larger effects are easier to find)

      • Sample size (larger samples make it easier to find effects)

      • Alpha level (larger alpha level makes it easier to find effects)

      • Non-directional vs directional hypothesis (directional tests make it easier to find effects)

400

What is the null hypothesis? Alternative hypothesis?

Null hypothesis: The observed findings are due to random chance (there does not appear to be a real effect)

Alternative hypothesis: The observed findings cannot be explained by sampling error (there does appear to be a real effect).

400

What assumptions must be met for an independent measures t-test?

Five assumptions should be true (or close to true) when using the t-statistic:

  1. The data are measured on an interval or ratio scale (appropriate scale). (The data are continuous)

  2. The sample data have been randomly sampled from the population (randomness).

  3. The variability of the data in each group is similar (homogeneity of variance).

    • For Independent-Measures t-tests this means the variances of the populations that samples are drawn from are similar.

  4. The sampled population is approximately normally distributed (normality).

  5. The values in the sample consist of independent observations (independence).

400

What are the weaknesses of a repeated measures t-test?

One potential disadvantage of using a repeated-measures design is known as testing effects.
• Exposure to the first condition may influence scores in the second condition.
      • For example, practice on an IQ test in the first condition may cause improved performance in the second condition.

Another set of potential problems are floor & ceiling effects.
• Floor effects occur when an individual’s score is so low in condition 1 they have nowhere to go but up in condition 2.
• Ceiling effects occur when an individual has such a high score in condition 1 there is nowhere to go but down in condition 2

400

What symbols are in each hypothesis test to represent the two hypothesis (null and alternative)?

Hypothesis Testing:

      - H0: μ2 = (XX)

      - H1: μ2 ≠ (XX)

One sample t-test:

      - H0: μ = (XX)

      - H1: μ ≠ (XX)

Independent measures t-test:
Non-directional
(Two-Tailed)
H0: μ1 = μ2
H1: μ1 ≠ μ2

Directional
(One-tailed)
H0: μ1 ≤ μ2
H1: μ1 > μ2 

Repeated measures t-test:
Non-directional
(Two-Tailed)
H0: μD = 0
H1: μD ≠ 0

Directional
(One-tailed)
H0: μD ≤ 0
H1: μD > 0

500

A company has developed a new drug, Thinking Cap, to improve IQ. The average IQ in the population is 128 (μ = 118), and the standard deviation is 12 (σ = 12). Researchers wanted to test the drug, so they sampled 36 individuals, gave them the drug, then tested their IQ a week later. Participants in the sample show an IQ of 100 (M = 114) after
Thinking Cap.

H0: Thinking Cap is not related to IQ. (μ2 = 128)

H1: Thinking Cap is related to IQ. (μ2 =/ 128)

a = 0.05

CV = +/- 1.96

z = -2.00

Statistical Decision: We reject the null

Descriptoion of Findings: Thinking Cap increases IQ.

500

A nutritionist claims that the average daily sugar consumption for adults in a particular city is 30 grams. The nationwide average daily sugar consumption for adults is 28 grams. To test this claim, a random sample of 16 adults is selected, and their daily sugar consumption is measured. The sample data are as follows (in grams):

    28,32,29,31,33,30,27,34,28,26

Using a significance level of α=0.05, determine whether the average sugar consumption in this city is significantly different from the nationwide average. Find the effect size (Cohen's d).

H0: The sugar consumption is not significantly different (μ = 44)
H1: The sugar consumption is significantly different (μ ≠ 44)

CV = +/- 2.132

t = 2.5 (Reject the null)

Description of findings: The sugar consumption is significantly different.

Cohen's d = 0.63 (Medium effect)

500

A researcher is investigating the effect of two different teaching methods on student test performance. Two independent groups of students were taught using Method A and Method B, and their test scores were recorded.

  • Group A: n1=15, M1=80, SS1=400
  • Group B: n2=12, M2=85, SS2=350

Using an α=0.01, test whether there is a significant difference between the two teaching methods. Find the effect size (r^2).

H0: There is no mean differences between groups (μ1 = μ2)
H1: There is a mean difference between groups (μ1 ≠ μ2)

CV = +/- 2.878

t = - 2.36 (Fail to reject the null)

Description of findings: There is no mean difference between groups

r^2 = 0.18 (Medium effect)

500

A sample of 25 participants reports their stress levels before and after a 4-week meditation program. The mean stress level before is 75, and the mean stress level after is 65, with a standard deviation of the differences of 8. Test at the 0.01 significance level whether meditation significantly reduces stress. Find if there is a significant decrease in stress and the effect size (r^2).

H0: The meditation program had no effect on stress level (H0: μD = 0)
H1: The meditation program decreases stress level (H1: μD ≠ 0)

CV = - 1.711

t = -6.25 (Reject the null)

Description of Findings: The meditation program decreased stress levels

r^2 = 0.62 (Large effect)

500

What causes Type I error? Type II error?

Type I error:

  • Caused by unusual, unrepresentative samples, falling in the critical region without any true effect.

  • Hypothesis tests are structured to make Type I errors unlikely.

Type II error

- More likely with a small treatment effect of poor study design (sample size too small).

M
e
n
u