Data Analysis
Surveys and Experiments
Probability
Exploring Bivariate Data
Taking Chances
100
A graph of the five number summary >A central box spans the quartiles Q(1) and Q(3) >A Line in the box marks the median M >Lines extend from the box out to the smallest and largest observation
What is a Boxplot
100
Control, Replicate, Randomize.
What are the principles of Experimental Design?
100
For any two events A and B, P(A or B) = P(A) + P(B) - P(A and B)
What is the general addition rule for unions of two events?
100
These can describe the overall pattern of a scatterplot
What is direction, form and strength
100
Suppose that James guesses on each question of a 50 item true-false quiz. find the probability that James passes if a score of 25 or more correct is needed to pass.
P(X>or =25) = 1-P(X
200
The mean and standard deviation is sensitive to the influence of a few extreme observations
What is a non-resistant measure of center and spread.
200
The basic probability sample: it gives every possible sample of a given size the same chance to be chosen.
What is a simple random sample?
200
P(B|A) = P(A and B)/ P(A)
What is Conditional Probability?
200
>It requires both variables be quantitative so that it makes sense to do the arithmetic indicated by the formula >It measures only the strength of linear relationships >It is non resistant >It indicates the direction of the relationship
What is correlation?
200
There are 50 poker chips in a container, 25 of which are red, 15 white, and 10 blue. You draw a chip without looking 25 times, each time returning the chip to the container. What is the expected number of white chips you will draw in 25 draws?
P(draw a white chip) = 15/50 =.3 E(X) = 25(.3) =7.5
300
In the Normal distribution with mean mu and standard deviation sigma. > approximately 68% of the observations fall within one sigma of mu >approximately 95% of the observations fall within 2 sigmas of mu >approximately 99.7% of the observations fall within 3 sigmas of mu
What is the empirical rule
300
An observed effect so large it would rarely occur by chance.
What is statistically significant?
300
a. State the problem or describe the random phenomenon b. state the assumptions c. assign digits to represent outcomes d. make many repetitions e. calculate relative frequencies and state your conclusions. t
What is simulation?
300
The use of the regression line for prediction outside the range of values of the explanatory variable x used to obtain the line. Such predictions are often not accurate.
What is extrapolation?
300
A basketball player makes 80% of her free throws. We put her on the free throw line and ask her to shoot free throws until she misses one. What is the probability that the player will make 5 shots before she misses?
P(success = miss) P(X =6) = (1-.2)^5 (.2) = .0655
400
If the points lie close to a straight line, the plot indicates that the data are Normal. Systematic deviations from a straight line indicate a non-Normal distribution. outliers appear as points that are far away from the overall pattern of the plot.
What is a Normal Probability Plot
400
Some groups in the population are left out of the process of choosing the sample. An individual chosen for the sample can't be contacted or does not cooperate.
What is undercoverage and nonresponse?
400
X has a countable number of possibler values. The probability distribution of X lists the values and their probabilities. Value of X: x1 x2 x3 ....xi probability p1 p2 p3 ....pi each pi is a number between 0 and 1 the sum or the probabilities is 1
What is a discrete random variable?
400
The difference between an observed value of the response variable and the value predicted by the regression line.
What is residuals?
400
10% of adults belong to health clubs. and 40% of these helath club members go to the club at least twice a week. What percent of all adults go to a health club at least twice a week?
let H = adult belongs to health club and G = adult goes to club at least twice a week. P(G and H) = P(H) X P(G|H) = 0.1 x 0.4 = .04
500
All Normal distributions are the same when measurements are standardized. If x has the N(mu, sigma) distribution, then this variable has the standard Normal distribution N(0,1)
What is a z score.
500
A group of experimental units or subjects that are known before the experiment to be similar in some way that is expected to systematically affect the response to the treatments.
What is a block?
500
mu(x) = the sum of ((xi)(pi))
What is mean of a discrete random variable?
500
An association or comparison that holds for all of several groups can reverse direction when the data are combined to form a single group.
What is Simpson's Paradox?
500
A rapid test for the presence in the blood of antibodies to HIV, the virus that causes AIDS, gives a positive result with probability about .004 when a person who is free of HIV antibodies is tested. A clinic tests 1000 people who are all free of HIV antibodies. You cannot use the Normal approximation for this distribution. Explain why.
We need n(p) and n(1-p) both to be at least 10, and we have n(p) =4 so the condition is not met.