Exploring Data
Interpret
Probability
Conditions
Sonje's Choice
100

Describe a Distribution

1. Shape – Skewed right? Skewed left? Fairly

symmetric? Two distinct peaks?

2. Outliers – If you are estimating, call them

“potential outliers.”

3. Center – What is the mean? If the distribution is

skewed, identify the median.

4. Variability – Remember, SD goes with the mean

and IQR goes with the median.

100

Interpret the Standard Deviation

The standard deviation gives the typical distance that the values are away from the mean.

100

1 standard deviation: 68%

2 standard deviations: 95%

3 standard deviations: 99.7%

What is Empirical Rule?

100

When a sample size is sufficiently large, a sampling distribution of the mean of the random variable will be approximately normal.

What is Central Limit Theorem?

100

How do I make a decision based on a P-value?

If the P-value ≤ a, reject the null hypothesis. 

If the P-value > a, fail to reject the null hypothesis.

200

What is the Outlier Rule?

An outlier is any value that falls more

than 1.5IQR above Q3 or below Q1.

Lower outliers < Q1 – 1.5(IQR)

Upper outliers > Q3 + 1.5(IQR)

200

Interpret the confidence interval

We are C% confident that the confidence interval

from ___ to ___ captures the population parameter

(in context).

200

1. Binary: two outcomes for each trial (success or

failure)

2. Independent: Each trial is independent of the

next

3. Number of trials is a fixed number (n)

4. Same probability of success for each trial (p)

Remember: Fixed number of trials.

What are Conditions for a Binomial Random Variable?

200

Random: Data come from a random sample


10%: When sampling without replacement,

n < 10% of the population size


Normal: Population distribution is normal,

large sample (n ≥ 30), or a dotplot of the

sample data shows no strong skewness or

outliers.

What are Conditions for a one-sample t-test and t-interval for μ?

200

Random assignment VS Random sample

Random assignment allows you to determine causality.

Random sample allows you to make generalizations about the population.


300

How do we describe the relationship between two

variables (like in a scatterplot)?

When describing the relationship between 2

quantitative variables, be sure to address:

1. Direction – positive or negative

2. Unusual values – outliers, influential

observations

3. Form – Linear or curved

4. Strength – Weak or Strong

300

Interpret the P-value

A P-value is the probability of obtaining a test statistic as extreme or more extreme than the observed test statistic when the null hypothesis is assumed to be true.

300

P(A and B) / P(B)

What is conditional probability?

300

Random: Data come from a random sample


10%: When sampling without replacement,

n < 10% of the population size


Large Counts:

• Test: np0≥ 10 and n(1 – p0) ≥ 10

• Interval: ≥ 10 and ≥ 10

What are Conditions for a one-sample z-test and z-interval for p?

300

Power

The power of a test is the

probability a test will correctly

reject the null hypothesis, given the

alternative hypothesis is true.

400

What is the difference between discrete and continuous variables?

A discrete variable can take on a countable number

of values. The number of values may be finite or infinite.

THINK: Discrete = Countable Ex. Number of students


A continuous variable can take on infinitely many

values, but those values cannot be counted.

THINK: Continuous = Must be measured

Ex. Height

400

Interpret the coefficient of determination r2

The coefficient of determination gives the percent of the variation of y-context that is explained by the least-squares regression line using x = x-context.

400

P(X=x) = (1- p)x-1 p

What is Formula for Geometric Probability? 

400

Random: Data from a random sample, separate

random samples, or groups in a randomized

experiment.


10%: when sampling without replacement: n

< 10% of the population size for all samples.


Large Counts: All expected counts must be at least 5

What are Conditions for a chi-square test?

400

How to calculate expected counts in a chi-square test for homogeneity/independence?

expected count = (row total)(column total)/table total

500

Describe the shape of the dataset

Skewed Left, Skewed Right, Uniform, Normal, Bimodal.

500

Interpret the slope of the Least Squares Regression Line

For every increase of 1 unit in x context, the

predicted y context increases/decreases by slope.

500

P(X=x)= (nCr)p(1-p)n-x

What is formula for Binomial Probability?

500

Linear: True relationship between the variables is

linear.


Independent observations, 10% condition when

sampling without replacement


Normal: Responses vary normally around the

regression line for all x-values


Equal Variance around the regression line for all x-

values


Random: Data from a random sample or randomized experiment

What are Conditions for a t-test or t-interval for slope?

500

Type I and Type II error

A Type I error occurs when the null

hypothesis is true and is rejected (false positive).


A Type II error occurs when the

alternative hypothesis is true and the

null hypothesis is not rejected (false negative).

M
e
n
u