The difference between categorical and quantitative variables
What is categories versus numerical values that can be averaged
The 68-95-99.7 rule describes this
What is the percentage of data within 1,2, and 3 standard deviations of the mean in a normal distribution?
What goes on each axis of a scatterplot of two quantitative variables?
Explanatory Variable on the x-axis and response variable on the y-axis
What is a simple random sample (SRS)?
A sampling method where a sample of size n is selected in such a way that each subset of size n is equally likely to be chosen.
What is the sum of probabilities in a sample space?
P(S)=1
The meaning of CSOCS and when we use it.
What is context, spread, outliers, center, and shape to describe a distribution?
How a z-score is calculated
What is x-mean/standard dev?
What does the correlation coefficient (r) measure?
The strength and direction of a linear relationship,
What type of sampling method is being used here?
A city government wants to survey households about recycling habits. The city is divided into 20 neighborhoods. Instead of sampling from all neighborhoods, officials randomly select 5 neighborhoods and survey every household within those neighborhoods.
Cluster sampling
What does it mean for two events to be mutually exclusive?
It means that two events can never occur at the same time
The components of a 5-number summary?
What are minimum, Q1, median, Q3, maximum?
In a standard normal distribution, what percentage of data falls below a z-score of 1? What calculator function and parameters did you use?
About 84%. Normal cdf with a lower bound of -99999999 an upper bound of 1 a mean of 0 and a standard deviation of 1.
How can a residual plot help you assess the fit of a regression line?
If there is no pattern in the residual plot, the linear model is appropriate
What is the difference between an observational study and an experiment?
An experiment applied treatments to measure effects, observational studies do not apply treatments.
What does P(A|B) mean?
The probability that event A occurs, given that event B has already occurred.
Or probability of A given B
The calculation of the IQR and outliers
Q3-Q1=IQR
lower bound=Q1-1.5*IQR
upper bound=Q3+1.5*IQR
A data set has a mean of 50 and a standard deviation of 10. Every data value is doubled and 5 is added, what is the new mean and standard deviation?
Mean = 105
Standard deviation = 20
What is the formula to calculate residuals and what does a negative residual indicate about the observed value?
residual = y-y(hat) or observed value - predicted value
A negative residual indicates that the observed value is less than the predicted value or falls below the regression line.
State and interpret the type of bias present.
A high school principal wants to know how much time students spend on homework each night. To gather this information, the principal conducts a survey during lunchtime in the cafeteria and asks students, "How much time do you spend on homework per night?"
Response bias, will likely overestimate number of hours spent on homework per week.
Explain in words what it means for two events to be independent? How do you check for independence?
The occurence of one event does not affect the probability of the other.
P(A|B)=P(A)
P(B|A)=P(B)
both must be true
Name a resistant measure of center, a resistant measure of spread, and explain why the are considered resistant?
Median and IQR, they are not affected by extreme values or outliers
A test score is normally distributed with a mean of 80 and a standard deviation of 5. What score corresponds to the 90th percentile?
approx 1.28
invnorm(.9, 0,1)
What does the coefficient of determination represent?
The proportion of the variation in the response variable explained by the explanatory variable?
In an experiment, random selection is used but random assignment is not used? Can an inference be made about the population? About cause and effect?
Yes about the population, no about cause and effect?
If P(A)=.4
P(B)=.5) and P(A and B)=.2
what is P(A or B)?
.7