Stats Starts Here
Displaying and Describing Categorical Data
Displaying and Summarizing Quantitative Data
Describing Distributions Numerically
The Standard Deviation as a Ruler
100

Systematically recorded information, whether numbers or labels, together with its context.

What is Data?

100

To analyze categorical data we often use ____ or ____

What is counts (frequencies) or percents (relative frequencies)?

100

Uniform, single or multiple modes, symmetry vs. skewed

What is the shape of the distribution?

100

If observation > Q3 + (1.5)(IQR)   Or

observation < Q1 - (1.5)(IQR)

What is suspected as an outlier?

100

A value found by subtracting the mean and dividing by the standard deviation.

What is a standardized value?

200

This ideally this tells Who was measured, What was measured, How the data were collected, Where the data were collected, and When and Why the study was performed.

What is Context?

200

Lists the categories in a categorical variable and the (percentage) count of observations for each category.


What is a (Relative) Frequency table  [Distribution of a categorical variable]?

200

Once we make a picture, we describe a distribution by telling about its ...

What is a distributions' shape, center, spread, and any unusual features?

200

Value such that p percent of the observations fall at or below it.

What is the pth percentile?

200

Adding a constant to each data vlue adds the same constant to the measures of position (mean, median, and quartiles), but does not change the measures of spread (standard deviation or IQR) is called __________.

Multiplying each data value by a constant multiplies both the measures of position (mean, median, and quartiles) and the measures of spread (standard deviation and IQR) by that constant is called _____.

What is Shifting?


What is Rescaling?

300

A participant, subject, or experimental unit

What is an individual for which or for whom data values are recorded?

300

Displays counts (percentages) of individuals falling into named categories on two (or more) variables, columns vs. rows.  The ____ categorizes the individuals on all variables at once, to reveal possible patterns in one variable that may be contingent on the category of the other.

What is a contingency table?

300

This data summarization that gives you a quick visual on the most central 50% of the data as well as any extreem values that don't look to belong with the rest.

What is the boxplot?

300

Report summary statistics to ___ decimal places.

What is 1 or 2 more than the original data.

300

Standardizing data is the application of what transforming function(s)?

What is first shifting (by x-bar) followed by scaling (with 1/s).

400

Either quantitative having natural units, categorical having names identifying categories, or categorical as well but with a unique name for each unique value (e.g. like ordinals)

What is the type of variable that holds information about the same characteristic for many cases (individuals)?


400

When bars represent the (percentage) count of each category in a categorical variable, the ____ can be used to analyze the distribution of one categorical variable; whereas a stacked ___ (showing 100% total including all categories of a variable) can be used for comparing distributions of different groupings of individuals.

What are two representations of a (Relative Frequency) Bar chart used for?

400

This measure of center shines as the best summarization of center when the data distribution is skewed.

What is the data's median?

400

When describing the distribution of a quantitative variable, if the shape is skewed then report _____. If the shape is symmetric then report _____ and repeat calculations without ______  if present.

What is median and IQR (they are based on position)?

What is mean and standard deviation (they are based on size/value)?

What is outliers?


400

z-scores tell us ______. 

Important uses are: ______________.

What is the number of standard deviations a value is from the mean?

1.  Comparing values from different distributions or values based on different units.

2.  Identifying unusual or surprising values among data.

3.

500

A way of reasoning, along with a collection of tools and methods, designed to help us understand the world.

What is Statistics?

500

The distribution of one of the variables ALONE is seen in the totals found in the last row/column of a contingency table  VS. The distribution of a variable restricting the Who to consider only a smaller group of individuals [A single row (column) of the contingency table.]

What is a marginal distribution vs. a conditional distribution?

500

These mathematical calculations of center and spread are significant summarizations of data distributions that are roughly symmetric and unimodal.

What is mean and standard deviation?

500

A complete analysis of data almost always includes:  ______, _______, and _______________.  

Answers are ________, not _________.

What is Verbal, visual, and numerical summaries?

What is sentences,  numbers?

500

Units can be eliminated by ________________ because __________ have no units.

What is standardizing the data because 1-scores have no units.