CMUDSC Data Science Jeopardy!

Deep Learning

Machine Learning

History

Statistics

Random

100

A tuning parameter in an optimization algorithm that determines the step size at each iteration while moving toward a minimum of a loss function.

Learning Rate

100

Producing a model with poor predictive ability because the model has not captured the complexity of the training data.

Underfitting

100

A very famous dataset containing 60,000 images of handwritten digits.

MNIST dataset

100

In statistics, values that are significantly different from the majority of the data points are called?

Outliers (They can be either significantly higher or lower than the rest of the data)

100

What is the output of a softmax function?

A vector of probabilities, adding up to 1 (A probability distribution over a set of K possible outcomes)

200

A full training pass over the entire dataset such that each example has been seen once is called what?

Epoch

200

In binary classification, a 2x2 grid that has four parts, the number of true positives, false positives, true negatives, and false negatives, showing comparisons vs actual results.

A confusion matrix

200

What was the name of the first computer that beat the world champion at chess?

Deep Blue (Famously defeated world chess champion Garry Kasparov in 1997)

200

The two parameters of a normal distribution.

- Mean (μ): This represents the average or central tendency of the distribution. It determines the peak of the bell curve.

- Standard Deviation (σ): This measures the spread or dispersion of the data around the mean. It determines the width of the bell curve. A higher standard deviation indicates a wider spread, while a lower standard deviation indicates a narrower spread.

200

A popular programming language designed by statisticians.

300

What does LSTM stand for?

Long Short-Term Memory

300

A technique of combining individual models together with the purpose of improving the predictive power of the overall model.

Ensemble Learning

300

Who coined the term Deep Learning in 2006?

Geoffrey Hinton

300

How is precision rate calculated?

Precision = True Positives / (True Positives + False Positives)

300

What is the range of sigmoid function output?

0 to 1

400

A technique to downsample feature maps, often used in convolutional models.

Pooling

400

Clustering methods, autoencoders, latent variable models.

These are all techniques used in Unsupervised Learning

400

PyTorch was primarily developed by which company?

Facebook AI Research (FAIR).

400

In probability and statistics, a theory that states as the number of trials increases, the average of the result will become closer to the expected value.

The Law of Large Numbers

400

What are the two main components of a GAN?

Generator: This neural network takes random noise as input and generates new data samples. It aims to create data that is indistinguishable from real data.
Discriminator: This neural network takes either real data or generated data as input and classifies it as real or fake.It acts as a critic, evaluating the quality of the generated data.

500

A regularization method that works by removing a random selection of units in a network layer for a single gradient step.

Dropout

500

Linear Kernel, Polynomial Kernel, Radial Basis Function (RBF) Kernel.

Three types of kernels of support vector machines (SVM)

500

Who is known as the founding father of convolutional nets?

Yann LeCun

500

In causal inference, a variable that influences both the dependent variable and the independent variable, causing a spurious association.

Confounding variable

500

In research, the logical error of concentrating on the people or things that made it past some selection process and overlooking those that did not.

Survivorship Bias (Occurs when researchers focus on individuals, groups, or cases that have passed some sort of selection process while ignoring those who did not)