Variable Types & Data Coding
Descriptive Statistics & Distributions
Graphical Summaries
Sampling & Population Concepts
Inference: Confidence & Hypothesis Testing
100
Variable types, in this order:


Inches (cm)

Cold - Room Temp. - Hot

Blue - Red - Green

What is a quantitative, ordinal, and categorical variable?

100

mean, median, mode

What are measures of location?

100

These are appropriate when viewing distribution of variables (multiple types)

What are histograms, bar plots, and boxplots

100

A number that encapsulates all elements, versus a number calculated from a subset.

What are statistics vs parameters?

What are samples vs population?

100

The probability that a population parameter will fall between a set of values, expressed using a percentage of certainty.

What is a confidence interval?

200

When you set a reference variable, and set x to 1 if an observation exhibits this level.

What is reference coding?

200

Range, variance, IQR, standard deviation

What are measures of spread?

200

The minimum value, the first quartile (Q1), the median (Q2), the third quartile (Q3), and the maximum value

What values are used to create a boxplot?

What is the five-number summary?

200

A ;arge number of random variables (sampling distributions) will be approximately normally distributed, regardless of the original distribution of the variables.

What is the Central Limit Theorem?

200

The data provides strong enough evidence to conclude that it is likely incorrect. This typically occurs when the p-value is below a predetermined significance level

When do you reject the null hypothesis?

300

In an ordinal variable, assuming consecutive levels in are consistent, versus assuming meaning between levels without order.

What is the difference between treating an ordinal variable as quantitative versus categorical?

300

modality, skewness, kurtosis, center, spread

What is the shape of a distribution

300

Plot that compares the quantiles of observed data to the quantiles of a normal distribution

What is a QQ-Plot?

300

A number that quantifies variability of values in a data set, versus a number that quantifies the quantifies spread of data

What is the difference between standard error and standard deviation?

300

A non-parametric statistical test used to compare two independent samples or groups.

What is a Wilcoxon Rank Sum test?

400

Categorical variables with versus without order.

What is the difference between nominal and ordinal categorical variables?

400

Asymtotic to the x-axis, symmetric, bell-shaped, and mean=median=mode

What are the characteristics of a normal distribution?

400

This plot illustrate the relationship between two quantitative variables

What is a scatter plot?

What is a line plot?

400

This is alculated from multiple random samples of a given size drawn from a population.

What is the sampling distribution of the sample mean?

400

How do you state null and alternative hypotheses for different situations?

Status quo vs anticipated effect or difference 

500

You have a dataset with a categorical variable “Education Level” (High School, Bachelor’s, Master’s, PhD) and a binary variable “Smoker” (Yes, No). How would you code these variables for use in a regression model?

Code “Education Level” using dummy variables with one category as reference (e.g., High School).

or use optimal scaling to represent years of school

Code “Smoker” as a single dummy / binary (1 = Yes, 0 = No).





500
Under the normal distribution, within 1 std of the mean is 68% of the data, within 2 is 95%, and within 3 is 99.7%
  • What is the Empirical Rule?

500

On a QQ-plot, the data exhibits a concave up pattern.

What is a Right Skew?

500

When given a dataset containing both categorical and quantitative variables, how do you determine the appropriate statistical model to use?

Examine the response variable—if it is quantitative, use linear regression models (e.g., multiple linear regression); if categorical, use classification models (e.g., logistic regression).

500

What is the relationship between two-tailed hypothesis tests and confidence intervals?

A two-tailed hypothesis test at significance level α corresponds to a (1–α)100% confidence interval.

If the hypothesized value (e.g., a mean or difference in means) falls outside the confidence interval, the null hypothesis is rejected; if it lies within, the null is not rejected.