What is the most commonly used definition of an outlier in AP Statistics?
What is the most commonly used definition of an outlier in AP Statistics?
x < Q1 - 1.5(IQR)
x > Q3 + 1.5(IQR)
Martin was analyzing a 2-way table about boys and girls who like pizza (yes/no). Assume an equal number of boys and girls.
Martin concluded that "More boys like pizza". What should he have said?
Martin was analyzing a 2-way table about boys and girls who like pizza (yes/no).
Martin concluded that "More boys preferred pizza". What should he have said?
A higher percentage of boys preferred pizza than girls.
The LSRL for a data set where x = height (cm) and y = weight (kg) is y-hat = 0.7x - 65. A student says that someone who is 180cm tall will weigh 61kg.
Why are they wrong?
The LSRL for a data set where x = height (cm) and y = weight (kg) is y-hat = 0.7x - 65. A student says that someone who is 180cm tall will weigh 61kg.
Why are they wrong?
LSRL shows predicted values; it's not deterministic.
Aaron gets an A on 60% of his tests. What is the expected number of tests he will need to take until getting his first A?
Aaron gets an A on 60% of his tests. What is the expected number of tests he will need to take until getting his first A?
E(X) = 1/p = 1/0.6 = 1.67
Describe the pros/cons of using:
-Median
-Mean
to describe center.
Describe the pros/cons of using:
-Median Pro: resistant to outliers, Con: less precise
-Mean Pro: more precise measure, Con: sensitive to outliers
to describe center.
Eason was analyzing a 2 way table about boys/girls preference between pizza/burgers. He found the following:
Boys - Pizza: 30/50, Burgers 20/50
Girls - Pizza: 10/50, Burgers 40/50
What could Eason say about association in this data?
Eason was analyzing a 2 way table about boys/girls preference between pizza/burgers. He found the following:
Boys - Pizza: 30/50, Burgers 20/50
Girls - Pizza: 10/50, Burgers 40/50
What could Eason say about association in this data?
There is a clear association between gender and food preference because the distribution of food preference is different for boys and girls.
When we do inference for LSRL, what is the parameter we're doing inference for?
When we do inference for LSRL, what is the parameter we're doing inference for?
Slope (Beta)
Aaron makes 80% of his free throws. If he takes 10 free throws, what is the probability he makes exactly 7 shots?
Aaron makes 80% of his free throws. If he takes 10 free throws, what is the probability he makes exactly 7 shots?
P(X=7) = (10 choose 7)(0.8)7(0.2)3 = 0.2013
binompdf(n=10, x=7, p=0.8) = 0.2013
Describe the pros/cons of using:
-Standard Deviation
-IQR
to describe variability (spread).
Describe the pros/cons of using:
-Standard Deviation Pro: takes value of all data into account; more precise/sensitive measure
Con: outliers can make it misleading
-IQR Pro: resistant to outliers.
Con: doesn't account for change in data values as long as order is the same
to describe variability (spread).
Out of 50 boys, 36 prefer sports, 9 prefer music, and 5 prefer art.
Out of 50 girls, 20 prefer sports, 18 prefer music, and 12 prefer art.
A bad AP Stats student (nobody from my class) wrote the following: "72% of boys prefer sports, 40% of girls prefer sports". What are they missing (just for sports)?
Out of 50 boys, 36 prefer sports, 9 prefer music, and 5 prefer art.
Out of 50 girls, 20 prefer sports, 18 prefer music, and 12 prefer art.
A bad AP Stats student (nobody from my class) wrote the following: "72% of boys prefer sports, 40% of girls prefer sports". What are they missing (just for sports)?
Comparative Language
Martin asked many different students how many hours they studied the week before a big test and compared it to their scores. He created a scatterplot with his data, and found r = 0.96 and r2 = 0.9216. How should he interpret the coefficient of determination?
Martin asked many different students how many hours they studied the week before a big test and compared it to their scores. He created a scatterplot with his data, and found r = 0.96 and r2 = 0.9216. How should he interpret the coefficient of determination?
0.9216 of the variability in test scores for students like these can be explained by variability in hours studied two weeks before a big test.
Jerry C comes to class on time about 40% of the time. Starting next week, what is the probability that Wednesday is the first class he comes to on time?
Jerry C comes to class on time about 40% of the time. Starting next week, what is the probability that Wednesday is the first class he comes to on time?
P(X = 3) = (0.6)2(0.4) = 0.144
geometpdf(p = 0.4, x = 3) = 0.144
Sometimes precise statistics cannot be determined from graphical displays. For histograms, which of the following can be determined precisely?
Shape, Outliers, Center, Variability
Sometimes precise statistics cannot be determined from graphical displays. For histograms, which of the following can be determined precisely?
Shape* (yes), Outliers* (sometimes), Center (no), Variability (no)
Given a set of ordered pairs with sx = 2.5, sy = 1.9, r = 0.63, what is the slope of the regression line of y on x?
Given a set of ordered pairs with sx = 2.5, sy = 1.9, r = 0.63, what is the slope of the regression line of y on x?
~0.48
Find the mean and standard deviation of the defect rate for 200 items in which 3% of items have defects. Show your work on the board.
Find the mean and standard deviation of the defect rate for 200 items in which 3% of items have defects. Show your work on the board.
mean = np = 200(0.03) = 6
SD = sqrt(200*0.03*0.97) = ~2.41
Other than shape, outliers, center, variability, and context...What other special features do we sometimes describe in a distribution of quantitative data?
Other than shape, outliers, center, variability, and context...What other special features do we sometimes describe in a distribution of quantitative data?
Peaks, Gaps, Clusters...also approximation of mode (unimodal, bimodal, etc.)