Big Data
Relationship Issues
Under Control
Will this be difficult? Probably.
Don't Test Me
100
This measure of center is more resistant to outliers than the mean.
What is the median?
100

You calculate it using: observed y - predicted y

What is the residual?

100
This phrase is used to describe an observed effect so large that it would rarely occur by chance.
What is statistically significant?
100
This type of random variable requires a fixed number of trials.
What is a binomial random variable?
100

The type of significance test used for the mean of a single population when the standard deviation of the population is unknown.

What is a T test?

200

To calculate, subtract the mean of the distribution from the observed x, then divide by the standard deviation.

What is the z-score (or standardized value)?

200
Measures the direction and strength of a linear relationship between two quantitative variables.
What is correlation (or r)?
200

This should be used to simulate random assignment.

What is a random number generator, flipping a coin, etc.?

200

This tells us the long-term outcome of a probability distribution. It should never be rounded.

What is the expected value?

200

How degrees of freedom are calculated for each kind of test: t-test for means, chi-squared test for goodness of fit, t-test for slope

t-test for means: n-1

 chi-squared test for goodness of fit: # of categories -1

 t-test for slope: n-2

300

This tells us what proportion of data lies at or below a given value.

What is a percentile?

300

The proportion of the variation in y that is explained by the linear relationship with x.

What is the coefficient of determination (or r squared)?

300
This experimental design involves the random assignment of units to treatments which are carried out separately within each group of units known to be similar in some way that is expected to affect the responses.
What is block design?
300

Events that have no outcomes in common and can never occur simultaneously, for which the addition rule is used.

What are mutually exclusive events?

300

The kind of test used to distinguish an association or difference in two categorical variables from a prediction.

What a chi-squared test?

400

The square of the standard deviation.

What is the variance?

400

When a least squares regression line is used to predict a y value for an x value that is very far from all other x values in the data set.

What is extrapolation?

400

A kind of experimental design when individuals receive both treatments and the results are compared.

What is a matched pairs experiment?

400

The condition involving the population size that must be satisfied to use sigma divided by the square root of n as the standard deviation of a sampling distribution.

What is the 10% condition?

400

The different Large Counts Conditions for proportions and for means.

What are:

for t-test: n is greater than or equal to 30

for z-test: np is greater than or equal to 10 & n(1-p) is greater than or equal to 10

500
This calculator command can be used to find the area under a normal distribution and above an interval.
What is normalcdf?
500

The equation of a LSRL is: 

predicted test score = 60 + 3.75 (hours spent studying)

Interpret the slope of the line in context.

What is "For every one hour increase in hours spent studying, the predicted test score is expected/predicted to increase by 3.75 points." ?

500

The requirement for being able to generalize results to an entire population.

What is a random sample?

500

The probability that a person is summoned for jury duty the first time in their 5th year of being eligible when there is a 15% chance of being summoned for jury duty annually.

What is .078?

500

The formula for calculating the confidence interval for the slope of a LSRL.

What is:

b +/- t*(SE of b)

M
e
n
u