STATISTICS MISCONCEPTIONS

Examining Distributions

Quantitative & Categorical Relationships

Experimental Design

Probability

Misconceptions

100

What is the main difference between a bar graph and a histogram?

A bar graph is a graphical representation for categorical variables whereas a histogram is a graphical representation for quantitative variables.

Additionally: Histograms visually display quantitative data; the five-number summary (minimum, Q1, median, Q3, and maximum) can be shown as well.

100

An outlier can be influential in terms of (multiple choice):

A. In the relationship

B. In the explanatory variable

C. In the response variable

D. All of the above

100

Cases in experiments are ______. If they are human, they are ______.

What is experimental units, subjects

100

Two events A and B are ______ if they have no outcomes in common and can never happen together.

What is disjoint

100

Sally concludes from her experiment that going to the gym and doing well on her exam are positively associated. She determines that if she goes to the gym the night before an exam, she will automatically do well and does not need to study for it. Why is this not the correct interpretation of her experiment?

Association doesn’t mean causation

200

The total area under the density curve always equals ______.

What is 1

200

What are the three ways to interpret a scatter plot?

What is strength, positive/negative, linear/non-linear

200

Explanatory variables are called ______. The value of a factor is called ______.

What is factors, levels

200

What is the interpretation of P(B|A) if event A be it snows tomorrow and event B be UVA cancel classes

What is the probability that UVA cancels classes given that it snows tomorrow

200

True or false: a correlation coefficient of 0 has no relationship

It has no LINEAR relationship

300

What are examples of resistant measures?

What are median and IQR?

Explanation: Resistant measures are not widely affected by changes of the values in a small number of observations (such as outliers).

300

The coefficient of determination measures the proportion of variability in the ______ variable that is explained by the ______ line.

What is response, regression

300

David conducted an experiment where he watched how many students used the cross-walk and how many students crossed. Is this an example of an observational study or a designed experiment?

What is an observational study

300

What is the mean of the following random x-values who all have the sample probability of ⅛. The data is discrete. (3, 8, 5, 9, 6, 7, 11, 10)

The formula for mean = (x₁)(p(x₁)) + (x₂)(p(x₂)) ... + (x_n)(p(x_n))
59/8 or 7.375

300

Anna wanted to know the proportion of UVA students who liked coffee. She wanted to have each grade be equally represented. Therefore, she took a random sample of 100 students from each grade level. Is this an example of block design and if not, why and what is this an example of?

This is not an example of a block design because a block design is a type of design, not a sampling method. Therefore, this is an example of stratified sampling because Anna is taking a random sample within a stratum.

400

If the density curve is skewed to the left, what is greater: the mean or median?

What is the median

400

What is extrapolation and why is it not reasonable to use?

Extrapolation is the use of a regression line for predicting values when the explanatory variable lies far outside the range of the data that was used to determine the line.

Additionally: Data points farther away from the line are unreasonable to use because it cannot be determined whether or not the line will change further out.

400

Sally was curious about the average number of hours all of the students in Microbiology were spending on homework for the class each week. The professor teaches five sections of Microbiology. One morning, Sally waits outside the classroom and asks the first 40 students who walk into the 9am section how many hours, on average, they spend on homework for the class. What sampling design did Sally use in this experiment and does it include any bias?

What is convenience sampling

Explanation: Convenience sampling causes undercoverage bias because she is not sampling the students in the other lecture times and she is only getting data from the first students arriving at class, missing students arriving late or who did not attend.

400

Anthony proposes that his group take a sample of ten people to give their survey to. Kate thinks that their sample size should be at least 50 people. Whose sample would be more representative and why?

The Law of Large Numbers states that the bigger the sample, the better estimate one has on the population because it decreases the variability in the experiment.

400

Jonathan wants to know what proportion of Calculus students thought the exam was hard. There are four calculus classes and Jonathan collected data from everyone in his calculus class. What is the sampling frame in this scenario and why?

The sampling frame in this scenario is the students in Jonathan’s calculus class

Explanation: This is because only the students in his class have a chance of being chosen to be in the sample. The sampling frame is not all Calculus students because Jonathan only gathered data from students in his class and there are four different classes.

500

If a graph has a mean of 60 and a standard deviation of 5, what is the z-score when x = 53 and what does the z-score tell us about the x?

z = (x-mu/standard deviation)/, so z = (53-60)/5 = -1.4.

Explanation: Z-score is how many standard deviations x is away from the mean. Therefore this z-score tells us that 53 is 1.4 standard deviations away from the mean. Also, the z-score being negative tells us that the data point is below the mean.

500

How do you interpret the slope and intercept of a linear regression line?

The slope is the amount of change in the predicted response when the explanatory variable increases by 1 unit.
The intercept is the predicted response when the explanatory variable is 0.

500

Jim ran an experiment about the proportion of people who prefer chocolate ice cream over vanilla. To gather data, Jim set up a table on his college campus and people came up and took the survey. Is there any bias present that could prevent Jim from getting a representative sample?

Jim used voluntary sampling where people had to choose to participate in the survey; therefore, Jim is missing a large number of people. This is an example of undercoverage.

500

Diagnostic test: P(D) is the probability the patient has the disease. P(P) is the patient tests positive. What is the conditional probability for a false positive test and a false negative test?

P (P^c/D) False negative
P(P/D^c) False positive

500

Kate flipped a coin and got heads the first time and heads again the second time. What is the probability that she would get heads on her third try?

The probability of her getting heads on her third try is 50% because the events are independent; therefore, the previous outcomes have no effect on her next trial.