What is the definition of statistics?
the science of data
The distribution of a categorical variable lists the categories and gives either the ___________ or the ____________ of individuals who fall in this category.
count, percent
a two-way table describes...
two categorical variables
The purpose of a graph is to help us...
understand the data
The most common measure of center is the...
arithmetic average or mean
What is the difference between a categorical variable and a quantitative variable?
categorical variable- places an individual into one of several groups or categories
quantitative variable- takes numerical values in which makes sense to find the average
Describe a pie chart
shows the distribution of a categorical variable as a “pie” whose slices are sized by the counts or percents for the categories; must include all of the categories that make up a whole
definition of marginal distribution and conditional distribution
marginal distribution- the marginal distribution of one of the categorical variables in a two-way table of counts is the distribution of values of that variable among all individuals described by the table
conditional distribution- describes the values of that variable among individuals who have a specific value of another variable; there is a separate conditional distribution for each value of the other variable.
The direction of skewness is the direction of the ... not the direction where...
long tail
most observations are clustered
definition of median
an informal measure of center that describes the “midpoint” of a distribution.
We call the pattern of variation of a variable its...
distribution
Portable MP3 music players, such as the Apple iPod, were at one time very popular (not so much anymore since our phones can now play music)—but not equally popular with people of all ages. Here are the percents of people in various age groups who own a portable MP3 player, according to an Arbitron survey of 112 randomly selected people.
Age Group (Years) Percent Owning an MP3 Player
12 to 17 54
18 to 24 30
25 to 34 30
35 to 54 13
55 and older 5
Would it be appropriate to make a pie chart for these data? Explain.
no, because it's greater than 100%. Each % in the table refers to a different age group not parts of a single whole
You could have also used a ____________ to compare the distributions of male and female responses. Each bar has five segments—one for each of the opinion categories. It’s hard to compare the “middle” segments, so for this example, a side-by-side bar graph makes comparison easier.
segmented bar graph
definitions of symmetric distribution, skewed to the left, and skewed to the right.
symmetric distribution- if the right and left sides of the graph are approximately mirror images of each other
skewed to the left- if the left side of the graph containing the half of observations with smaller values is much longer than the right side
skewed to the right-if the right side of the graph containing the half of observations with larger values is much longer than the left side
What is the median travel time for our 15 North Carolina workers? Here are the
data arranged in order:
5 10 10 10 10 12 15 20 20 25 30 30 40 40 60
The count of observations n = ______ is _________.
Median = ___________
15, odd, 20
what is age, height, and income an example of?
quantitative variable
What is another name for a bar graph and what is the definition?
bar charts
represent each category as a bar. The bar heights show the category counts or percents. They are easier to make than pie charts, easier to read, and more flexible. Bar graphs are great at comparing any set of quantities that are measured in the same units.
1) Find the conditional distribution of telepathy among students from the United Kingdom and the United States.
U.K.
total- 200
telepathy- 44
U.S.
total- 215
telepathy- 66
U.K.- 22%
U.S.- 30.7%
what are the names of the different peaks on a graph?
unimodal - 1 peak
bimodal- 2 peaks
trimodal- 3 peaks
multimodal- more than 4 peaks
The simplest measure of variability is the _____.
• to find: subtract the _______________ value from the _______________ value
• shows the _____________ _____________ of the data
• depends only on the _______________ and _______________, which may be
_______________
range
smallest, largest
full spread
maximum, minimum, outliers
Jake is a car buff who wants to find out more about vehicles that students at his school drive. He gets permission to go to the parking lot and record some data. Later he does some research about each model of car on the internet. Finally, Jake makes a spreadsheet that includes each car’s model, year, color, number of cylinders, gas mileage, weight, and whether it has a navigation system.
Who are the individuals in Jake’s study?
the cars in the student parking lot
Two important lessons from looking at these examples:
1. beware of the pictograph
2. watch those scales
define association
there is an association between 2 variables if knowing the value of one variable helps predict the value of the other. If knowing the value of one variable does not help you predict the value of the other, then there is no association between the two variables
Steps of making a stem plot
Steps To Make A Stemplot:
1) Separate each observation into a stem, consisting of all but the final digit, and a tail, the final digit. Write the stems in the vertical column with the smallest at the top, and draw a vertical line at the right of this column. Do not skip any stems, even if there no data value for a particular stem. For these data, the tens digits are the stems and the ones digits are the leaves.
2) Write each leaf in the row to the right of its stem. For example, the female student with 50 pairs of shoes would have a stem 5 and leaf 0.
3) Arrange the leaves in increasing order out from the stem.
4) Provide a key that explains in context what the stems and leaves represent.
the five-number summary is:
minimum, Q1, median, Q3, maximum