CMU DSC Jeopardy!

DS Fundamentals

Machine Learning

Probability and Stat

Data Structures

Random

100

What term describes the phenomenon where a model performs well on the training data but fails to generalize to new, unseen data?

What is overfitting?

100

What is the training error of a 1-Nearest-Neighbor Model?

What is 0?

100

In the context of hypothesis testing, this type of error occurs when the null hypothesis is incorrectly rejected.

What is a Type-I Error

100

Which data structure follows a LIFO (Last In First Out) order of operations?

What is a stack?

100

Who is the president of CMU Data Science Club?

200

In time series data, what word describes the correlation of a time series with its own lagged values?

What is autocorrelation?

200

In the context of binary classification in machine learning, this metric is defined as the number of true positive results divided by the number of all positive results, including those not identified correctly.

What is Precision?

200

This theorem states that, regardless of the shape of the original population distribution, the distribution of the sample mean will approach a normal distribution as the sample size increases.

What is the Central Limit Theorem?

200

What term is defined as: the number of entries N in table divided by the table capacity M

What is the load factor?

200

What two numbers are at the beginning of every statistics course offered at CMU?

300

What type of plot is this, and what do the colors represent?

What is a Mosaic Plot with Pearson Residuals?

300

This ensemble learning method combines multiple weak learners to create a strong learner for improving model predictions. One common technique involves training subsequent models to correct the errors of previous ones.

What is Boosting?

300

What distribution is this?

What is the Poisson Distribution

300

A binary tree in which the value of each node is less than or equal to the value of its parent.

What is a MaxHeap?

300

Who is the director of the Undergraduate Statistics Program at CMU?

Who is Peter Freeman?

400

What is the name of the dimensionality reduction technique done to transform high-dimensional data into a lower-dimensional space while retaining the most important information?

What is Principal Component Analysis (PCA)?

400

This loss function, commonly used in training deep neural networks for classification tasks, measures the difference between two probability distributions for a given set of predictions and true values.

What is Cross-Entropy-Loss?

400

What assumption does our data violate based on this residual plot?

What is homoskedasticity (aka non-constant variance)?

400

This advanced tree structure balances itself automatically, ensuring a logarithmic height and efficient search, insertion, and deletion operations.

What is an AVL Tree?

400

What is the ranking of CMU's Graduate Statistics Program on USNews (in terms of all American Grad Schools)?

What is #5?

500

Which kind of feature regularization works by adding a penalty term based on the squared values of the coefficients?

What is Ridge Regression?

500

In the realm of neural networks, this complex recurrent neural network architecture, designed to process sequences of data by using a gating mechanism to control the memorizing process, addresses the vanishing gradient problem common to simpler RNNs.

What is Long Short-Term Memory (LSTM)?

500

What distribution is this?

What is the Pareto Distribution

500

In the context of memory management, this term refers to a situation where a program incorrectly accesses a memory location outside its allocated space, often resulting in unpredictable behavior.

What is a segmentation fault?

500

What is the name of the Poker Bot that CMU created in 2019 that defeated the world’s best players at multi-player no-limit Texas Hold’em?

What is Pluribus?