Enter Displaying and Analyzing Categorical Data
Analyzing Quantitative Data
Displaying and Describing Quantitative Data
Comparing Distributions
The Normal Curve
100

Decide if each of the following are either categorical or quantitative variables.

• manufacturer

• cost

• screen size

• type

• performance score

• manufacturer (categorical)

• cost (in dollars, quantitative)

• screen size (in inches, quantitative)

• type (categorical)

• performance score (quantitative).

100

What do you look for when looking for the shape of a distribution?

To describe the shape of a distribution, look for, single vs. multiple modes, symmetry vs. skewness, and outliers/gaps.

100

What are the benefits of using a histogram?

A histogram uses adjacent bars to show the distribution of a quantitative variable. Each bar represents the frequency (or relative frequency) of values falling in each bin.

100

What is one thing beyond SOCS that must be done when comparing two sets of data.

One thing many students forget to do is to answer the question if there is one, such as, what conclusion can be drawn from this comparison.

100

what does it mean for a graph to be normal?

When it has a bell shape, or normal distribution, and the mean, median, and mode are all the same

200

What does it mean for a variable to be independent?

Variables are said to be independent if the conditional distribution of one variable is the same for each category of the other. We’ll show how to check for independence in a later chapter.

200

How does an outlier affect the variance of a set of data?

The spread(range) and mean is effected, while the median and IQR is less likely to be changed.

200

What does SOCS stand for

Shape, Odd, Center, and Spread

200

What must be done when comparing distributions?

When comparing the distributions of several groups using histograms or stem-and-leaf displays, consider their: 

-Shape 

-Odd

-Center 

-Spread

Also known as SOCS in class, but must compare between both distributions.

200

Interpret z-score.

How many standard deviations below/above the mean a piece of data is in a set of data

300

What is the difference between a population and sample?

A population are all the cases we wish to know about, while a sample are the cases we actually examine in seeking to understand the much larger population.

300

What must be included in a 5-number summary?

The 5-number summary of a distribution reports the minimum value, Q1, the median, Q3, and the maximum value.

300

When do you know when to use median or mean as a center

This is usually decided upon based on context of the data and which one a person believes fits the data set better. Just make sure you can give a reason, such as more number so mean is better representing the data.

300

What all is involved with comparing boxplots

-Compare the shapes. Do the boxes look symmetric or skewed? Are there differences between groups? 

-Compare the medians. Which group has the higher center? Is there any pattern to the medians? u 

-Compare the IQRs. Which group is more spread out? Is there any pattern to how the IQRs change? 

-Using the IQRs as a background measure of variation, do the medians seem to be different, or do they just vary much as you’d expect from the overall variation? 

-Check for possible outliers. Identify them if you can

300

What is the empirical rule?

In a Normal model, about 68% of values fall within 1 standard deviation of the mean, about 95% fall within 2 standard deviations of the mean, and about 99.7% fall within 3 standard deviations of the mean.

400

Why does relative frequency not prove anything?

Although frequency can be useful to measure the difference in proportions, it is simply a proportion, not an number that can be used as evidence in many cases.

400

The first 6 prices of items at a store is 39.99, 4.99, 9.99, 17.99, 24.99, and 14.99. Is the $39.99 item an outlier?

Yes.

24.99(Q3)-12.49(Q1)=12.5(IQR)(1.5)=18.75

Median=14.99

14.99+18.75=33.74<39.99

400

How do you figure if data is an outlier?

Find the IQR, multiply it by 1.5. With that number add it to Q3 and substrate it from the Q1, any number in that spread is not an outlier.

400

Compare these Distributions


The shape is approximately normal in both.

The are no apparent outliers in both

The center is different as the yellow graph has a mean of 20, while the blue has 40.

Since they are both approximately normal, the have similar IQR's.

400

A normal distribution with a mean of $300 and a SD of $34, what percentage of items will be less than $250

normalcdf(-1000,250,300,34)= 7.07%

500


If the employee is a male, what is the relatitive frequency that they are ages 25-35

40/112=.357

500

Make a 5 number summary for the following

 

min=0

Q1=1

mean=1.65 and Median=1.5

Q3=2.5

max=4


500

Describe the Distribution:

Shape - Skewed Right

Odd - Potential outlier at 700

Center - Around 200

Spread - Ranges between 0-700 seconds

500

Compare the graphs of the Yes and No smoking for birth weight. What conclusion can be drawn from this.


Yes is slightly skewed right, while no is slightly skewed left. There are no outliers in either graph. The median appears to be higher in the no graph in comparison to the yes graph. No graph IQR seems to be slightly higher and the max is great than the yes graph, while the yes graph also has a lower minimum than the no graph. According to the graphs, babies that are born from people that don't smoke tend to weigh more at birth compared to people that do smoke.

500

If z=-2.57, s=5.5, and the mean is 65. What is the value of this piece of data

50.865