Exploring Data
Relationships Between Variables
Gathering Data
Randomness and Probability
Probability Models
100

Unlike continuous quantitative variables, these variables take only specific values that differ by some minumum amount. 

What are discreet variables? 
100

True or False: if r=0, there is no association between the variables. 

What is "False"? 

A few things could cause a r=0, even when an association occurs, such as an influential point far from the trend of the rest of the data, or a non-linear association. 

100

This is a study which measures a response that has already taken place within a sample, usually by a survey. 

What is a retrospective study? 

100

This is the quantity given by the formula

(# of events in A) / (# of events in S)


What is the probability of an event A?
100

This is the expected number of heads you get in ten tosses of a fair coin. 

What is five? 
200

This is how you should bet if your friend is tossing a fair coin and it lands heads ten times in a row. 

What is "bet that your friend is lying, and the coin isn't really fair." If it is a fair coin, it doesn't matter how you bet- neither outcome is more likely, because tosses of a fair coin are independent

200

This is what the residuals plot looks like when a linear model is appropriate for the data set. 

What is "no apparent pattern"? 

A residuals plot with an apparent pattern or curve is evidence that a linear model is not appropriate for the data. 

200

This survey sampling method separates the sampling frame into homogeneous groups, then randomly selects subjects from within each group. 

What is stratified sampling?  

200
For example, when drawing a card from a standard deck, consider the events A = drawing a face card, B = drawing a number card or an ace. 

What are complementary events? 

200

This is the calculator expression used to find the probability that a binomial random variable X is less than or equal to a.

What is binomialcdf(n,p,a)?

300
Unlike a histogram, in which the heights of the bars sum to the total number of data values, the heights of the bars in a relative frequency distribution sum to this number. 

What is one? 

300

A regression analysis of company profits and the amount of money the company spent on advertising found that r2=.72. Which of these is true?

I. This model can correctly predict the profit for 72% of companies.

II.  On average, about 72% of a company’s profit results from advertising.

III. On average, companies spend about 72% of their profits on advertising.

What is "none of these?" 

 r2=.72 imploes that 72% of the variation in the profits is accounted for by the relationship between advertising spending and profits. 

300

In an experiment, the purpose of blocking is to reduce the variability in the response due to the presence of one of these.

What is a confounding variable? 

300

This described A and B whenever P(A and B) = P(A)*P(B). 

What are independent events? 

Otherwise, P(A and B) = P(A|B)*P(B). 

300

This is the mean of the random variable X + Y if X~N(0,1) and Y~N(0,1).

What is 0?

400

In a normal distribution, this is how many standard deviations the 95th percentile is from the mean.

What is 1.645? 

Trick question. The answer is not "two." 95 percent of the data is within two standard deviations of the mean. The 95th percentile of the standard normal distribution is 1.645. 

invNorm(.95, 0, 1)= 1.645

400

When the correlation between variables x and y is r=0.4,  this is the correlation between -y and 2x. 

What is r=0.4? Linear changes (multiplication/division, addition/subtraction, switching values) do not alter the correlation between two variables. 

400

This type of experiment design does not incorporate blocking, but randomly distributes subjects among treatment groups. 

What is a completely randomized design? 

400

In a contingency table, this is how to find the probability of A given B, when A is a column heading and B is a row heading. 

What is the number in the cell in column A and row B divided by the total of row B? 


400

This is the variance of the random variable X + Y if X~N(0,1) and Y~N(0,1).

What is 1.414? 

500

Consider the data summary below:

min= 23, Q1= 40 , med= 48, Q3= 54, max= 61

These are thresholds for outliers in this data set. 

What are 27 and 69? 

IQR = 54 - 40 = 14

1.5*IQR = 21

med - 1.5*IQR = 27, med + 1.5*IQR = 69

500

If the constant changes in the explanatory variable reduce the response by the same percent, then this is how the data should be reshaped to make the relationship linear. 

What is "take the natural log of the y values."  
500

This is why blocks should be homogeneous with respect to the blocking variable. 

What is "so that differences in response are only compared between subjects for which the value of the confounding variable is similar." 

If blocks were heterogeneous, each block has the same problem as the whole sample: it's difficult to tell if variation in response is due to the factor or the confounding variable. 

500

In a standard 52 card deck, this is the probability of drawing a face card or a ten, then drawing an ace.  

What is (12/52 + 4/52)(4/51) = .024? 

 

500
Suppose that a card is drawn from a standard card deck with replacement. This is the probability of drawing 5 cards and getting at least three face cards. 

What is 1-binomcdf(5, 12/52, 2) = .084

It's Bernoulli:

P(S) = 12/52, P(F) = 1-12/52

P(S) is constant

Since cards are replaced, trials are independent. 

n = 5, 

we seek P(X >=3) = 1-P(X<3) = 1-P(X<=2)

M
e
n
u