Introduction to Data
Summarizing Data
Probability
Normal Distributions
Confidence Intervals and Hypothesis Testing
100

In a study looking at the efficacy of stents, one group received the same medical management as the treatment group but did not receive the stent. This is the __________ group.

Control group

100

A case-by-case view of data for two numerical variables can be best described as a _____________. 

scatterplot

100

The proportion of times an outcome would occur if the random process was observed an infinite number of times is known as the __________.

probability

100

A symmetric, unimodal, bell-shaped curve is known as a _____________ 

Normal Distribution

100

This value describes how much as estimate will tend to vary from one sample to the next.

Sampling error or sampling uncertainty

200

If you were looking at a dataset comparing the COVID-19 hospitalization rates by state, the variable state is a ___________ variable

categorical, nominal

200

These are the three measures of central tendency. If given a data set, you should be able to find/calculate all three. 

Mean, median, and mode

200

The conditional probability of outcome A given condition B is computed as ___________.

The P (A and B) divided by the P (B)

200

The standard normal distribution has a mean of _____ and a standard deviation of _______.

Mean u = 0, standard deviation = 1

200
If you have an independent population of 500 and a sample proportion of p = 0.65, is the population sufficiently large to use the Central Limit Theorem?

Yes, because n*p = 500*0.65 is greater than 10 and n*(1-p) = 500*0.35 is also greater than 10.

300

If you were looking at a dataset comparing the COVID-19 hospitalization rates by state, the hospitalization rate is a ___________ variable.

numerical, continuous

300

A contingency table summarizes data for two ___________ variables. 

Contingency table

300

Your department is holding a raffle. They sell 400 tickets and offer 3 prizes. What is the probability of winning a prize if you buy one ticket and the tickets are sampled without replacement?

(1/400) + (2/399) + (3/398) = 0.0025 + 0.0050 + 0.0075 = 0.015 or 1.5%

300

The number of standard deviations an observation falls above or below the mean is known as the ________

Z-score

300
A 95% confidence interval for a population proportion means this. 

It is the range in which we can be 95% confident the true population proportion is found.

400

A common downfall to sampling is when individuals who are more easily accessible are more likely to be included in the sample. This type of sample is known as a ____________. 

convenience sample

400

The variables treatment and outcome are independent with no relationship if the ____________ model is correct. 

Independence model

400

A discrete random variable is different from a continuous random variable in that a discrete random variable can be ________. 

counted

400

When finding the area under the curve, many tables give the area to the left of an observation. To find the area to the right of the observation, you should _________. 

Subtract the area to the left of the observation from 1.

400

The 95% confidence interval for a point estimate can be found using this equation.

Point estimate +/- 1.96 x SE
500

Studies where researchers assign treatments to cases are called __________. 

Experiments

500

The variables treatment and outcome are not independent and there is a relationship between them if the ________ model is correct.

Alternative model

500

The area under the curve of the probability density function (or probability distribution) is always equal to __.

1

500

The 68-95-99.7 rule says that approximately _________ of the observations fall within two standard deviations of the mean. 

95%

500

The ___________ represents an alternative claim under consideration and is often represented by a range of possible parameter values.

alternative hypothesis