Variables/Visualizations
Central tendency/spread/distributions
Probability
100

Which of the following would be an appropriate visualization for the variable "zip code"? (There may be multiple): Bar chart, Box plot, Frequency table, Histogram, Pie chart, Segmented bar chart

This is categorical data. So, bar chart, frequency table, or pie chart

100

When can we find the median for a categorical variable and why can't we find the median for every categorical variable?

We can't find the median for a categorical variable when the variable is nominal. This is because we need to be able to order the data in order to find the median (i.e. it must be at least ordinal).

100

At a certain college (which has 400 students), there is a Biology club and a Camping club. The Biology club has 20 members. The Camping club has 60 members, 3 of which are in the Biology club. At this school, is being in the Biology club independent of being in the Camping club?

Yes, independent.

200

Bar charts and histograms look quite similar, but they have some differences. Bar charts are for categorical data; histograms are for quantitative data. The X-axes of both of these reflects that. What is the other difference between these two visualizations, and what does this difference symbolize?

There are spaces between the columns of a bar graph but not between those of a histogram. This symbolizes that the categories on the X-axis of a bar graph can be moved around, but the numbers on the X-axis of a histogram can not.

200

You take a sample of Wellesley students, measure their heights, and calculate the median to be 63 in. Each of your five friends also takes a sample of Wellesley students, but their medians are all quite different from yours (and from each other's). What disadvantage of the median am I describing?

It is unstable (compared to the mean).

200

I have a thumbtack. If I drop it onto my desk, it either lands with the point facing up or not facing up. How can I assess the probability of the thumbtack landing face up, and what type probability would I be using?

Drop the tack on your desk several times. Count how many times it lands face up and divide that by the number of times you dropped the tack. This is frequentist probability.

300

On a box plot for the following data, is there a visible whisker on the right-hand side? If so, where does it end?


3, 5, 5, 5, 5, 5, 6, 6, 10

No visible whisker on the right-hand side

300

The last step of calculating standard deviation is taking the square root. Why do we do this?

Otherwise our units are very difficult to interpret! e.g. movies^2

300

A bag is filled with marbles: 5 yellow and 5 green. You draw two marbles at random. What's the probability that at least one of them is yellow?

P(Y1 U Y2) = P(Y1) + P(Y2) - P(Y1 and Y2) = P(Y1) + P(Y2) - P(Y1)*P(Y2|Y1) = 5/10 + 5/10 - (5/10)*(4/9) = 0.778

400

The midhinge of a distribution is defined as the mean of the first quartile and the third quartile.  The midrange of a distribution is defined as the mean of the minimum and the maximum data point.  

Is the midhinge resistant to outliers?  Is the midrange resistant to outliers?

The midhinge is resistant. The midrange is not resistant.

400

Consider the variable Age at Death in the US population. Based on your expectations, rank the following measures of central tendency from lowest to highest: mean, median, mode.

Mean, median, mode.

400

In a certain course, 37% of the students are athletes. Of the athletes, 73% would prefer to win an Olympic gold medal over a Nobel Prize. Of the non-athletes, 31% would prefer to win a gold medal over a Nobel Prize. What's the probability that a student chosen at random from the class would prefer Olympic gold over a Nobel Prize?

P(G) = P(A)*P(G|A) + P(A')*P(G|A') = 0.37*0.73 + 0.63*0.31 = 0.4654

500

To detect harmful insects in fields, we can put up sticky boards and examine the insects trapped on them.  What attracts insects the best?  Experimenters placed six boards of various colors at fields of various elevations. They counted the bugs on each board. What are the cases and variables?

Cases: The boards

Variables: Elevation, color, # of bugs

500

I take a sample of six people and ask them how many pandas they have pet. I calculate the standard deviation to be 0. This must mean that.... (list all that apply)

i. all of the people in my sample have pet the same number of pandas.

ii. half of my sample has pet more than the mean number of pandas and half have seen below the mean number of episodes.

iii. the histogram of my sample will form a uniform distribution.

iv. a calculation error was made in determining the standard deviation.

v. no one in my sample has pet a panda.

Just i.

500

Let A and B be events that are mutually exclusive AND independent. Also, P(B) = 0.3. What is P(A)?

0