Statistics Trivia

Statistical Methods

Probability & Distributions

Linear Regression

Machine Learning

100

How do you interpret a p-value in the context of hypothesis testing?

he probability of observing the test statistics or a more extreme test statistic under the conditions of the null hypothesis.

100

In set theory, what is the notation translating to “and” and “or”, respectively?

Intersection and union, respectively.

100

How do we interpret the R^2 value of a linear regression model?

The proportion of y that’s explained by the regressors in the model.

100

In linear algebra, what is the dot product of a 2x3 matrix and a 3x3 matrix?

2x3 matrix. Recall: inners must match and outers are the resulting matrix dimensions.

200

True or False? If a p-value is less than the significance level, we reject the null hypothesis. There is enough evidence that the null hypothesis is incorrect.

True. If p-value <= significance level -> reject the null hypothesis. If p-value > significance level, fail to reject the null hypothesis.

200

True or false? The probability of the empty set is zero.

True. By definition.

200

What is the R^2 adjusted, and why is it necessary when assessing multiple linear regression models?

It is a modified version of R^2 and is used to compare the performance of multiple linear regression models. Unlike R^2 which increases every time a regressor is added to the model, R^2 adjusted only increases if the new predictor improves the model accuracy.

200

What is the primary difference between classification and clustering algorithms?

Classification algorithms are trained on a response variable, whereas clustering algorithms are not.

300

What is the difference between Type I and Type II errors?

Type I errors occur when there is a false positive (true null hypothesis is incorrectly rejected). Type II errors occur when there is a false negative (false null hypothesis is not rejected).

300

What is the difference between a combination and permutation?

Order doesn’t matter for combinations, but order does matter for a permutation.

300

What are the two types of variance examined in an ANOVA?

Variance within groups (mean squared error) and variance between groups (mean squared groups). If MSG (between groups) >> MSE (within groups), then at least one group is statistically different.

300

What is the purpose of cross-validation, and how does it improve model performance?

Cross-validation is used to assess how a model will generalize on new data, and it improves model performance by reducing overfitting (occurs when a model performs really well on the data set it’s trained on but really poorly on a new data set).

400

How do we interpret a confidence interval versus a prediction interval?

Confidence interval: We are X% confident that the true population parameter falls within the confidence interval. Prediction interval: We are X% confident that a single future observation will fall within the prediction interval.

400

What is the difference between a probability density function (pdf) and cumulative density function (cdf)? Hint: Only continuous random variables can have pdfs.

A pdf gives the relative likelihood of a specific value (for continuous data), while a cdf provides the cumulative probability that a variable is less than or equal to a value.

400

What are the three main ANOVA assumptions and how do you test them?

(1) Normality (q-q plot), (2) homoscedasticity/constant variance between groups (residuals vs. fit plot), and (3) independence observations (residual vs. order plot).

400

What is principal component analysis (PCA) and what is its primary purpose?

A method of dimensionality reduction technique that transforms d-dimensional data into a smaller set of variables called principal components. Its primary purpose is to simplify complex data for visualization, improve machine learning efficiency, and remove multicollinearity.

500

What is the primary difference between a z-distribution and a t-distribution and when is this particularly relevant?

The z-distribution assumes a known population standard deviation (sigma) and is used for large samples (n >= 30), whereas the t-distribution is used when the population standard deviation is unknown, requiring the use of the sample standard deviation (s). The t-distribution is flatter, accounting for higher uncertainty in small samples. The distinction is particularly relevant when working with small vs. large samples, as well as when the population standard deviation is vs. isn’t known. As the sample size increases, the t-distribution converges to the z-distribution.

500

What is the relationship between a pdf and cdf? Hint: Only continuous variables can have pdfs.

The pdf is the slope of the cdf. The derivative of the cdf is the pdf, and the integral of the pdf is the cdf.

500

What are the three most popular linear regression model selection techniques and how do they differ in how they remove regressors?

(1) Backward selection: Begins with all regressors and removes insignificant regressors one by one, (2) Forward selection: Begins with no regressors and adds significant regressors one by one, and (3) Stepwise selection: Begins with no regressors, and adds significant regressors, re-evaluating their significance at each step and removing any newly insignificant variables as needed.

500

What is the primary difference between k-means clustering and hierarchical clustering? What are the advantages to each?

K-means requires a pre-specified number of clusters (k), whereas hierarchical clustering does not require a pre-defined number of clusters, as it determines the number of clusters. As a result, k-means is more efficient and therefore scales well to high-dimensional data. While hierarchical is less efficient, it doesn’t require a pre-defined number of clusters, which could be advantageous.