DATA ANALYSIS
EXPERIMENTAL DESIGN
PROBABILITY
INFERENCE
VOCABULARY
100
Four pairs of data are used in determining a regression line y=-2+6x. If the four values of the independent variable are 37, 52, 18, and 23, respectively, what is the mean of the four values of the dependent variable? A) 32.5 B) 193 C) 195 D) 778 E) The mean cannot be determined from the given information.
What is B? The mean of the x-values is (37+52+18+23)/4=32.5. Since (x-bar, y-bar) is a point on the regression line, y-bar = -2+6(32.5) = 193
100
Which of the following is most important in minimizing the placebo effect? A) Replication and randomization B) Replication and blinding C) Randomization and blinding D) Randomization and a control E) Blinding and a control
What is E? Use of a control group and blinding as to which subjects are in the control group are the best tools to minimize the possibility of confounding due to the placebo effect. Replication and randomization are important marks of good experimental design, but they do not impact the placebo effect as does the use of a control and blinding.
100
Suppose 80% of jurors come to a just decision. In a jury of six people, what is the probability more than half come to a just decision? A) .09888 B) .34464 C) .8 D) .90112 E) .98304
What is D? 1 - P(3 or fewer) = 1-binomcdf(6,.8,3)
100
In general, how does doubling the sample size change the confidence interval size? A) Doubles the interval size B) Halves the interval size C) Multiplies the interval size by 1.414 D) Divides the interval size by 1.414 E) This question cannot be answered without knowing the sample size.
What is D? Increasing the sample size by a multiple of d divides the interval estimate by the square root of d.
100

Explain what a z score represents. 

What is the number of standard deviations a value is above or below the mean. 

200
When there are multiple gaps and clusters, which of the following is the best choice to give an overall picture of a distribution? A) Mean and standard deviation B) Median and interquartile range C) Boxplot with its five-number summary D) Stem-plot or histogram E) None of the above are really helpful in showing gaps and clusters.
What is D? Stemplots and histograms can show gaps and clusters that are hidden when one simply looks at calculations such as mean, median, standard deviation, quartiles, and extremes.
200
A bank wishes to survey its customers. The decision is made to randomly pick ten customers who just have checking accounts, ten customers who just have savings accounts, and ten customers who have both checking and savings accounts. This procedure is an example of which type of sampling? A) Cluster B) Convenience C) Simple random D) Stratified E) Systematic
What is D? In stratified sampling the population is divided into homogeneous groups called strata, and random samples of persons from all strata are chosen. In this example, the bank stratified by type of account holding into three strata.
200
53% of adults say they have trouble sleeping. If a doctor contacts an SRS of 85 adults, what is the probability that over 55% will say they have trouble sleeping? A) .3109 B) .3558 C) .3640 D) .4000 E) .6442
What is B? np=85(.53)=45 >10 and nq=85(.47)=40 > 10. The sampling distribution is approximately normal with mean .53 and standard dev. .0541 normalcdf(.55, 1, .53, .0541) = .3558
200
What is the probability of a Type II error when a hypothesis test is being conducted at the 5% significance level? A) .05 B) .10 C) .90 D) .95 E) There is insufficient information to answer this question.
What is E? There is a different probability of Type II error for each possible correct value of the population parameter.
200

Explain what the p-value means.

What is the probability of obtaining results at least as unusual as the sample if the null hypothesis is true.

300
If quartiles Q1=50 and Q3=70, which of the following must be true? I. The median is 60 II. The mean is between 50 and 70. III. The standard deviation is at most 20. A) I only B) II only C) III only D) All are true E) None may be true
What is E? The median is somewhere between 50 and 70, but not necessarily at 60. Even a single very large score can result in a mean over 70 and a standard deviation over 20.
300
Advantage(s) to using surveys as opposed to experiments is (are) that I. Surveys are generally cheaper to conduct II. It is generally easier to conclude cause and effect from surveys. III. Surveys are generally not subject to bias. A)I only B)II only C)III only D)I and II E)II and III
What is A? Surveys are generally cheaper and quicker to conduct than experiments; however surveys are subject to bias, and it is very difficult to conclude cause and effect from surveys.
300
If P(A)=.25 and P(B) = .34, what is P(A or B) if A and B are independent? A) .085 B) .505 C) .590 D) .675 E) There is insufficient information to answer this question.
What is B? P(A or B) = .25 + .34 - (.25)(.34) = .505
300
A guidance counselor wishes to determine the mean number of changes in academic major by college students to within + or - 0.1 at a 90 percent confidence level. What sample size should be chosen if it is known that the standard deviation is 0.45? A) 8 B) 54 C) 55 D) 78 E) 110
What is C? 0.1 = 1.645(0.45) / (sqroot n) n= 55
300

Explain what a residual is

What is the difference in the actual y value and the predicted y value

400
Which of the following statements about influential points are true? I. Looking at a residual plot is an excellent way of picking out influential points. II. Removal of an influential point sharply affects the regression line. III. Determining a regression model with and without a point is an excellent way of picking out influential points. A)I and II B)I and III C)II and III D)I, II, III E) None of the choices gives the complete set of true responses.
What is C? Influential points are points whose presence or absence sharply affects the regression line.
400
A company wishes to survey what people think about a new product it plans to market. They decide to randomly sample from their customer database as this includes phone numbers and addresses. This procedure is an example of which type of sampling? A) Cluster B) Convenience C) Simple Random D) Stratified E) Systematic
What is B? Convenience samples are based on choosing individuals who are easy to reach.
400
An inspection procedure at amanufacturing plant involves picking three items at random ant then accepting the whole lot if at least two of the three items are in perfect condition. If in reality 84 percent of the whole lot are perfect, what is the probability that the whole lot will be accepted? A) .560 B) .593 C) .667 D) .706 E) .931
What is E? (.84)(.84)(.84)+3(.84)(.84)(.16) = .931
400
A confidence interval estimate is determined from the summer earnings of SRS of n students. All other things being equal, which of the following will result in a larger margin of error? I. A greater confidence level II. A larger sample standardard deviation III. A larger sample size A) I and II B) I and III C) II and III D) I, II, and III E) None of the above gives the complete set of true responses.
What is A? The margin of error varies directly with the critical z-value and directly with the standard deviation of the sample, but inversely with the square root of the sample size.
400

Explain what 95% confidence means. 

What is in many random samples, 95% of the confidence intervals produced will contain the true parameter for the population.

500
Suppose the average score on a national exam is 500 with a standard deviation of 100. If each score is increased by 20 and the result is increased by 10 percent, what are the new mean and standard deviation? A) mean = 570, standard dev. = 100 B) mean = 570, standard dev. = 110 C) mean = 572, standard dev. = 100 D) mean = 572, standard dev. = 110 E) mean = 572, standard dev. = 132
What is D? Increasing each score by 20 increases the mean to 520 and leaves the standard deviation unchanged at 100. Then increasing each result by 10 percent increases both the mean and standard deviation by 10 percent to 520 + 0.10(520)= 572 and 100 + 0.10(100) = 110.
500
Sampling error occurs A) When interviewers make mistakes resulting in bias. B) When interviewers use judgement instead of random choice in picking the sample. C) When samples are too small. D) Because a sample statistic is used to estimate a population parameter. E) In all of the above cases.
What is D? Different samples give different sample statistics, all of which are estimates for the same population parameter, and so error, called sampling error, is naturally present.
500
There are five outcomes to an experiment and a student calculates the respective probabilities of the outcomes to be .34, .50, .42, 0, and -.26. The proper conclusion is that A) The sum of the individual probabilities is 1. B) One of the outcomes will never occur. C) One of the outcomes will occur 50% of the time. D) All of the above are true. E) The student made an error.
What is E? Probabilities are never less than 0.
500
A college recruiter is interested in comparing the SAT math and verbal scores of applicants to the college. An SRS of 40 applicants is chosen, and the math and verbal scores are noted. Which of the following is a proper test? A) Test of difference in two population means. B) Test of difference in two population proportions. C) One sample test on differences of paired data. D) Chi-square goodness-of-fit test. E) Chi-square test for homogeneity.
What is C?
500

Explain what r squared represents.

What is the percent of the variation in the y that is accounted for by the linear relationship with x.