What makes a line of best fit, fit "best"?
Closest to the data, minimizes error
(a+b+c)^2 = _______
a^2 + b^2 + c^2 + 2ab + 2bc + 2ab
What are the 3 main measures of center? find each for this given set of data:
{ 1 , 2 , 3 , 1 , 2 , 3 , 1 , 2 , 3 , 4 }
Mean = 2.2
Median = 2
Mode = {1,2,3}
Given the sample {30, 40, 50} of apples on a tree from a group of 7,000,000 apple trees find:
1. How many apple trees have less than 40 apples?
2. More than 900 apples?
3. Between 30 and 50 apples?
1. P(0) = 50% --> 3.5 mil
2. Q(148.96) = 1- P(148.96) = reasonably 0%
3. P(1) = .8413; P(-1) = 1-.8413 = .1587
.8413-.1587 = .6826 --> 4,778,200 apples
What are the 4 types of models we can linearize? Provide the general equation for each
1. linear y = ax+b
2. natural log y = alnx+b
3. exponential y = ab^x
4. variation y = ax^b
For the given data - Fill in the formula for r with the relevant sums but do not multiply it out.
x y
1 17
3 42
5 8.3
(-17.4)/sqrt((8)(612.127)
Find the mean, median, mode, variance, and standard deviation of the data set:
{32, 54, 63, 12, 56, 23, 9}
mean = 35.57
median = 32
mode = all
variance = 420.24
standard deviation = 20.4998
What is the only difference in calculating a sample variance from a population variance? Why is this difference included?>
What is the domain of an r-value (correlation coefficient). What does an r-value tell us about a line of best fit?
d: [-1,1]
r tells us on average how far the data points are from the line --> gives us a gauge of how good a line of best fit models our data.
Using your formulas from brute force for m and b, write out lists you would need in your calculator and the sums of those needed to find m and b. You do not need to find m or b.
x. y
1 17
3 42
5 8.3
sum (x) = 9
sum (y) = 67.3
sum(x^2) = 35
sum(xy) = 184.5
1. 68.26% of data centered on mean
2. 95% of data centered on mean
3. 99.7% of data centered on mean
1. {-1,1}
2. {-2,2}
3. {-3,3}
Find the standard error of the data set:
{32, 54, 63, 12, 56, 23, 9}
Sample standard deviation = sqrt(2941/7-1) = 22.14
standard error = 22.14 / sqrt(7) = 8.37
Find the ssRES for the following data given y'=2x-3
x y
1 4
2 6
3 7
4 10
5 12
116
find the model that best fits the data and find the equation of best fit to the nearest thousandth place.
x y
1 42 6
3 7
4 10
5 814
exponential
lny=1.114x-.517
y = .596(3.047)^x
When finding the standard deviation of a data set, why do we not just find the average distance of each data point from the mean? What do we do instead?
sum = 0.
We find the sum of the squares of each, and then root the final answer.
A chain cookie store has been selling cookies laced with mercury. Here is a sample of the amount of mercurcy per batch from 8 stores:
{3.2, 5.6, 4.3, 7.8, 9.0, 1.1, 4.5, 6.6}
Find the 99% confidence interval and interpret it.
z = 2.576, mean = 5.2625, sample standard dev = 2.5466, standard error = .900
2.576 = x-(5.2625) / .900 --> x = 7.582
lower bound: 7.582-5.2625 = 2.319;
5.2625-2.319 = 2.94
99% confident that the true mean amount of mercury in the cookies is between 2.94 and 7.58.
Why do we calculate the ssRES instead of the sum of the residuals? hint: there are 2 reasons
1. sum of residuals cancel each other out from pos and neg sings. Squaring makes all pos
2. Squaring exaggerates further off error
DAILY DOUBLE - SPICY
Reverse the linearization of the equation y = 8x^4 to write it in terms of natural logs.
your final answer should look like : lny = alnx+b
lny = ln(8x^4) = ln(8)+4ln(x)
lny = 4lnx + 2.08
Randomly sampled salaries in the Bay Area (rounded): {21,000,000; 348,000; 22,000; 125,000; 95,000; 89,000; 206,000, 187,000}
A govt. official is interested in knowing what the average salary in the bay is from this data. Find it.
mean = 2,759,000 --> wrong average, skewed by 21 mil.
Median = 156,000$ --> less effected by outlier
A data scientist collected a sample of n values and found the standard error to be 1, and also found the
sum of (x-mean(x))^2 = 12. How many data points must he have collected? How many answers are possible here?
Bonus: Prove there will always only be 1 sensible answer
equation becomes: n^2-n-12 = 0.
Either 4 or -3 --> only 4 makes sense in context.