Designing Studies
What does it mean to have a 'completely randomized design'?
That your study uses random selection of the experimental units to the treatment groups in an attempt to avoid as much bias as possible.
What is a sample space? What does it have to do with a probability model?
Sample space is the total possible outcomes. A probability model gives a probability for each unique outcome
Whats the difference between a regular mean and an expected mean?
Regular mean is calculated from data, an expected mean is calculated from a probability distribution.
Problem T4.14 (chapter 4 AP Stats Practice test)
a) How and why was blocking used?
b) Why did they randomize the order in which subject received the 2 treatments?
c) Could this experiment be carried out in a double blind manner?
1. They most likely blocked into similar levels of caffeine dependency since the subjects were volunteers
2. To remove bias. Mixing up the times taken helped potentially remove other confounding errors.
3. Yes. the pills could be labeled in a way the deliverers would not know which was placebo or not and simply record the results without any bias.
In a room of 23 people, what are the odds 2 people have the same birthday? What about a room of 75?
23 people is 50-50, and 75 is 99.9%
What are 3 different types of experimental designs that specifically alter how we approach treatment assignment? (instead of doing random assignment first)
-Block design: separate into specific groups for a reason. Men from women, tall from short, etc
-Cluster design - separate into groups with equal qualities. Groups A, B and C, etc should have similar compositions
-Matched pairs - pair each experimental unit with another that match in almost all ways being measured.
Given 2 events A and B, draw a venn diagram :
a and b, a or b, a and b^c
For a and b - the circles intersections should be shaded. For a or b everything within the circles should be shaded and for a and not b only wahts in circle a should be shaded 9 NOT THE INTERSECTION)
Whats the difference between variance and standard deviation?
The variance is giving a measurement of how far the data is from the mean, where standard deviation is a measurement of how the data is from each other.
R5.4 - Chapter 5 Review Exercises
a) Verify that this is a legitimate assignment of probabilities?
b) What is the probability that a randomly chosen American is hispanic?
c) Non-hispanic whites are the historical majority in the united states. What is the probability that a randomly chosen American is NOT a emember of this group?
d) Explain why P(W or H) does not equal P(W) + P(H). THen find P(W or H)
a) all the probabilities add to 1
b) 14.9%
c) .326 or 32.6%
d) There is overlap between W and H, so for OR you'd have to subtract the double counted number
P(W orH)=P(W)+P(H)-P(W and H)
=.813+.149-.139=.823=82.3%
does .99999 repeating forever = 1?
Yes.
What are the 3 key ethics all experimental studies using people as subjects must follow?
1. IRB - Institutional Review Board to get approval
2. Informed consent - must know the purpose of the study and generally what will be done
3. Confidentiality - must keep personal data not pertaining to the data of study private. This does not mean you can be anonymous.
What does it mean to be Mutually Exclusive? Independent? Can 2 events be both M.E. and independent?
Mutually exclusive means that 2 events they cannot both happen at the same time.
Independent means they have no impact on each other.
No. M.E. and independence cannot happen simultaneously.
a test called the ITBS (Iowa test of basic skills) has a N(6.8, 1.6). Find the
P(X>=9)
and explain what it means.
P(X>=9)=1-P(X<9)
z=(9-6.8)/1.6=1.375=>perc=.9154 1-.9154=.0846 or 8.5%
This means 8.5% of Iowas test takers score above a 9 on the basic skills test.
Ken is traveling for business. He has a new 0.85 oz toothpaste thats supposed to last him the whole trip. The amount of toothpaste he uses per brush is N(.13,.02). If he brushes his teeth 6 times, whats the probability he uses all the toothpaste?
New mean: mu_T=.13+.13+...+13=.78
New st. dev: sigma_T= sqrt(.02^2+.02^2+...+.02^2)=.05
Theres a 8.1% chance he runs out of toothpaste.
What size shoe does the statue of liberty wear?
Size 879
What are 3 types of errors we can run into when collecting data that may be reducible or avoidable if done correctly?
1. Under-coverage - Missing part of, or a whole, group (or multiples groups) relevant to your population in your sample.
2. Nonresponse - Cannot reach individuals selected for the sample
3. Response - Generally associated with lying. Can be reduced with anonymity or phrasing of the question, but never removed. A strong reason for larger samples.
What does it mean to be a complement when dealing with probability? How is it useful with the statement 'at least 1'?
The complement of an event means that 1- that same event is the odds of it not happening.
for at least one, we can take the complement of 'none' and subtract that from 1. That is much easier as its '1 path' instead of many paths of calculation.
Whats the difference between a geometric and binomial distribution?
Geometric is asking whats the probability of k-1 fails before a success, where a binomial will ask you the odds of 'k' successes out of n trials.
In baseball, a .300 hitter gets a hit 30% of the times they bat. When a baseball player hits .300 its really impressive. Typical major leaguers hit about .260 in 500 at bats. Each at bat is seemingly independent. Could a major leaguer hit .300 just by chance?
mu_x=np=500*.260=130
sigma_x=sqrt(500(.260)(.740))=.98
For a batting average of .300 over 500 at bats, they need 150 hits
P(X>=150)=>z=(150-130)/9.8=2.04=>perc=.9793
The odds then are 1-.9793=.0207 or 2%. So it can happen, its just rare and we don't expect it by chance.
We all agree that boys between 18-25 are the largest reckless group of drivers. But what tiny group represents the peak of the most aggressive drivers?
Young women in their 20's
A large high school wants to gather student opinions about parking for students on campus. It isnt practical to contact all students.
1. Give an example of a voluntary sample
2. Give an example of a convenience sample
Explain why both are not ideal choices and what effect they will have on the data.
1. Voluntary - Unhappy students are more likely to respond. Likely giving below the actual average.
2. Convenience - Taking results near the parking lot. This will likely have an above average response since those students found parking.
Playing Cards: J - Drawing a Jack, R - Getting a red card
a) Whats the P(J) and P(R) individually
b) Whats the P(J and R)
c) Whats the P(J or R)
d) Whats the P(J|R)
P(J or R)=4/52+26/52-2/52=28/52=7/13 a) P(J)=4/52=1/13, P(R)=26/52=1/2
b) P(J and R)=2/52=1/26 as there are only 2 red jacks
c) P(J|R)=(P(J and R))/(P(R))=2/26=1/13
What happens to the means of 2 different random variables if you add/subtract them? What about the standard deviations?
The means will add or subtract as expected. The standard deviations will always be individually squared, added together then have the square root taken
Marti decides to keep placing a $1 bet on number 15 in consecutive spins of a roulette wheel until she wins. On any spin, theres a 1/38 chance that the ball will land in the 15 slot.
a)How many spins do you expect it to take until Marti wins?
b)Would you be surprised if Marti won in less than 4 spins?
mu_x=1/(1/38)=38
P(X<=3)=>g.eometcdf(1/38,3)=.101 or 10%
We are somewhat surprised, but its not truly rare.
What percent of (supposedly sober) Americans said they could beat a bear in unarmed combat?
6% (I wonder if 6% of Americans are clowns?)