Inches (cm)
Cold - Room Temp. - Hot
Blue - Red - Green
What is a quantitative, ordinal, and categorical variable?
mean, median, mode
What are measures of location?
These are appropriate when viewing distribution of variables (multiple types)
What are histograms, bar plots, and boxplots
A number that encapsulates all elements, versus a number calculated from a subset.
What are statistics vs parameters?
What are samples vs population?
The probability that a population parameter will fall between a set of values, expressed using a percentage of certainty.
What is a confidence interval?
When you set a reference variable, and set x to 1 if an observation exhibits this level.
What is reference coding?
Range, variance, IQR, standard deviation
What are measures of spread?
The minimum value, the first quartile (Q1), the median (Q2), the third quartile (Q3), and the maximum value
What values are used to create a boxplot?
What is the five-number summary?
A ;arge number of random variables (sampling distributions) will be approximately normally distributed, regardless of the original distribution of the variables.
What is the Central Limit Theorem?
The data provides strong enough evidence to conclude that it is likely incorrect. This typically occurs when the p-value is below a predetermined significance level
When do you reject the null hypothesis?
In an ordinal variable, assuming consecutive levels in are consistent, versus assuming meaning between levels without order.
What is the difference between treating an ordinal variable as quantitative versus categorical?
modality, skewness, kurtosis, center, spread
What is the shape of a distribution
Plot that compares the quantiles of observed data to the quantiles of a normal distribution
What is a QQ-Plot?
A number that quantifies variability of values in a data set, versus a number that quantifies the quantifies spread of data
What is the difference between standard error and standard deviation?
A non-parametric statistical test used to compare two independent samples or groups.
What is a Wilcoxon Rank Sum test?
Categorical variables with versus without order.
What is the difference between nominal and ordinal categorical variables?
Asymtotic to the x-axis, symmetric, bell-shaped, and mean=median=mode
What are the characteristics of a normal distribution?
This plot illustrate the relationship between two quantitative variables
What is a scatter plot?
What is a line plot?
This is alculated from multiple random samples of a given size drawn from a population.
What is the sampling distribution of the sample mean?
How do you state null and alternative hypotheses for different situations?
Status quo vs anticipated effect or difference
You have a dataset with a categorical variable “Education Level” (High School, Bachelor’s, Master’s, PhD) and a binary variable “Smoker” (Yes, No). How would you code these variables for use in a regression model?
Code “Education Level” using dummy variables with one category as reference (e.g., High School).
or use optimal scaling to represent years of school
Code “Smoker” as a single dummy / binary (1 = Yes, 0 = No).
What is the Empirical Rule?
On a QQ-plot, the data exhibits a concave up pattern.
What is a Right Skew?
When given a dataset containing both categorical and quantitative variables, how do you determine the appropriate statistical model to use?
Examine the response variable—if it is quantitative, use linear regression models (e.g., multiple linear regression); if categorical, use classification models (e.g., logistic regression).
What is the relationship between two-tailed hypothesis tests and confidence intervals?
A two-tailed hypothesis test at significance level α corresponds to a (1–α)100% confidence interval.
If the hypothesized value (e.g., a mean or difference in means) falls outside the confidence interval, the null hypothesis is rejected; if it lies within, the null is not rejected.