This is when we systematically tend to overestimate or underestimate the true population parameter
Bias
This is a number between 0 and 1 that gives the strength and direction of the relationship between 2 quantitative variables
Correlation (r)
This law states that simulated (empirical) probabilities tend to get closer to the true probability of an event as the number of trials increases.
The Law of Large Numbers
Based on these two values, we either reject or fail to reject the null hypothesis.
The significance level (alpha) and the p-value
What's the difference between a parameter and a statistic?
A parameter is a number that describes the population
A statistic is a number that describes the sample
What is the difference between an observational study and an experiment?
Experiments impose treatments, Observational studies do not
What is the difference between categorical and quantitative variables?
A categorical variable takes on values that are category names or labels. A quantitative variable takes on numerical values for a measured or counted quantity.
In this setting, there are two outcomes for each trial (success or failure), each trial is independent of the next, the number of trials is fixed, and the probability of success is the same for each trial.
Binomial Setting
What is the confidence interval formula for a one-sample t-interval for
\mu ?
\bar x +- t^* (s/\sqrt(n))
What is the formula for a the Geometric probability P(X=x) where X:= the number of trials until the first success?
P(X=x)=(1-p)^(x-1)*p
When can we make conclusions about cause and effect?
When researchers randomly assign subjects to treatment groups (in an experiment)
When is a data point considered an outlier? (Do not just say "when it is unusually larger or smaller than the other points" - be specific!)
A data point is considered an outlier either when it is 1.5 * IQR above or below Q3 or Q1 respectively or when is more than 2 standard deviations away from the mean.
What is the formula for and interpretation of a z?
z = (value - mean) / (standard deviation)
A z score tells us how many standard deviations a value is away from the mean
What is the probability that a specific confidence interval captures the population parameter?
0 or 1! (Do NOT say the confidence level!)
How do we interpret the slope of a Least Squares Regression Line (LSRL)?
For every 1 unit increase in x (in context), the predicted y (in context) increases/decreases by the slope.
What is a randomized block design and what is its purpose?
The purpose of blocking is to reduce variability of results within each treatment group and eliminate possible confounding variables
How do we describe the relationship between tow variables (like in a scatterplot)?
Address the relationship's direction (positive or negative), unusual values (outliers, influential points), Form (linear, non-linear), and strength (weak, strong)
For the random variables X and Y, what is the formula for the mean and standard deviation of X - Y? Answer in terms of
\mu _X , \mu _Y, \sigma_X, and \sigma_Y
\mu_D = \mu_X - \mu_Y
\sigma_D = \sqrt(\sigma_X^2 + \sigma_Y^2 )
What are the *specific* conditions for a two sample z-test for p1 - p2?
Random: Data come from independent random samples or 2 groups in a randomized experiment
10%: when sampling without replacement, n < 10% of the population size for both samples
Normality: n_1\hat p _c >= 10 , etc... (use the large counts rule with the pooled p-hat value)
The Daily Double! Is Mr. K a good Teacher?
(True or False)
True
What four elements should a well-designed experiment include?
Comparison of at least two treatment groups, random assignment of treatments to experimental units, replication, control of potential confounding variables
How do we describe a distribution (for one variable)?
Comment on shape (symmetry, modality, etc.), center (mean or median), spread (or variability - standard deviation, range, IQR), and unusual features (outliers, gaps, clusters).
Can two mutually exclusive events also be independent? Give an example to support your answer
No, two mutually exclusive events cannot be independent. Example: For a single coin flip, let H = heads and T = tails
What 2 - 3 questions can we ask to determine the correct inference procedure?
Does the scenario describe mean(s), proportions(s), counts, or slope?
Does the scenario describe one sample, two samples, or paired data?
Does the scenario describe a test or a confidence interval?
What is the difference between the population distribution, the sample distribution, and the sampling distribution?
The population distribution is the distribution of responses for every individual in the population.
The sample distribution is the distribution of responses for a single sample.
The sampling distribution is the distribution of values for the statistics for all possible samples of a given size from a given population.