Describe a Distribution
1. Shape – Skewed right? Skewed left? Fairly
symmetric? Two distinct peaks?
2. Outliers – If you are estimating, call them
“potential outliers.”
3. Center – What is the mean? If the distribution is
skewed, identify the median.
4. Variability – Remember, SD goes with the mean
and IQR goes with the median.
Interpret the Standard Deviation
The standard deviation gives the typical distance that the values are away from the mean.
1 standard deviation: 68%
2 standard deviations: 95%
3 standard deviations: 99.7%
What is Empirical Rule?
When a sample size is sufficiently large, a sampling distribution of the mean of the random variable will be approximately normal.
What is Central Limit Theorem?
How do I make a decision based on a P-value?
If the P-value ≤ a, reject the null hypothesis.
If the P-value > a, fail to reject the null hypothesis.
What is the Outlier Rule?
An outlier is any value that falls more
than 1.5IQR above Q3 or below Q1.
Lower outliers < Q1 – 1.5(IQR)
Upper outliers > Q3 + 1.5(IQR)
Interpret the confidence interval
We are C% confident that the confidence interval
from ___ to ___ captures the population parameter
(in context).
1. Binary: two outcomes for each trial (success or
failure)
2. Independent: Each trial is independent of the
next
3. Number of trials is a fixed number (n)
4. Same probability of success for each trial (p)
Remember: Fixed number of trials.
What are Conditions for a Binomial Random Variable?
Random: Data come from a random sample
10%: When sampling without replacement,
n < 10% of the population size
Normal: Population distribution is normal,
large sample (n ≥ 30), or a dotplot of the
sample data shows no strong skewness or
outliers.
What are Conditions for a one-sample t-test and t-interval for μ?
Random assignment VS Random sample
Random assignment allows you to determine causality.
Random sample allows you to make generalizations about the population.
How do we describe the relationship between two
variables (like in a scatterplot)?
When describing the relationship between 2
quantitative variables, be sure to address:
1. Direction – positive or negative
2. Unusual values – outliers, influential
observations
3. Form – Linear or curved
4. Strength – Weak or Strong
Interpret the P-value
A P-value is the probability of obtaining a test statistic as extreme or more extreme than the observed test statistic when the null hypothesis is assumed to be true.
P(A and B) / P(B)
What is conditional probability?
Random: Data come from a random sample
10%: When sampling without replacement,
n < 10% of the population size
Large Counts:
• Test: np0≥ 10 and n(1 – p0) ≥ 10
• Interval: ≥ 10 and ≥ 10
What are Conditions for a one-sample z-test and z-interval for p?
Power
The power of a test is the
probability a test will correctly
reject the null hypothesis, given the
alternative hypothesis is true.
What is the difference between discrete and continuous variables?
A discrete variable can take on a countable number
of values. The number of values may be finite or infinite.
THINK: Discrete = Countable Ex. Number of students
A continuous variable can take on infinitely many
values, but those values cannot be counted.
THINK: Continuous = Must be measured
Ex. Height
Interpret the coefficient of determination r2
The coefficient of determination gives the percent of the variation of y-context that is explained by the least-squares regression line using x = x-context.
P(X=x) = (1- p)x-1 p
What is Formula for Geometric Probability?
Random: Data from a random sample, separate
random samples, or groups in a randomized
experiment.
10%: when sampling without replacement: n
< 10% of the population size for all samples.
Large Counts: All expected counts must be at least 5
What are Conditions for a chi-square test?
How to calculate expected counts in a chi-square test for homogeneity/independence?
expected count = (row total)(column total)/table total
Describe the shape of the dataset
Skewed Left, Skewed Right, Uniform, Normal, Bimodal.
Interpret the slope of the Least Squares Regression Line
For every increase of 1 unit in x context, the
predicted y context increases/decreases by slope.
P(X=x)= (nCr)px (1-p)n-x
What is formula for Binomial Probability?
Linear: True relationship between the variables is
linear.
Independent observations, 10% condition when
sampling without replacement
Normal: Responses vary normally around the
regression line for all x-values
Equal Variance around the regression line for all x-
values
Random: Data from a random sample or randomized experiment
What are Conditions for a t-test or t-interval for slope?
Type I and Type II error
A Type I error occurs when the null
hypothesis is true and is rejected (false positive).
A Type II error occurs when the
alternative hypothesis is true and the
null hypothesis is not rejected (false negative).