Deep Learning
Machine Learning
History
Statistics
Random
100

A tuning parameter in an optimization algorithm that determines the step size at each iteration while moving toward a minimum of a loss function.

Learning Rate

100
Producing a model with poor predictive ability because the model has not captured the complexity of the training data.

Underfitting

100
Very famous dataset containing 60000 images of handwritten digits.

MNIST

100

Values distant from most other values are called what?

Outliers

100

What is the output of a softmax function?

A vector of probabilities, adding up to 1

200

A full training pass over the entire dataset such that each example has been seen once is called what?

Epoch
200

What is a confusion matrix in binary classification?

A 2x2 grid that has four parts, the number of true positives, false positives, true negatives, and false negatives, showing comparisons vs actual results.

200

What was the name of the first computer that beat the world champion at chess?

Deep Blue

200

The two parameters of a normal distribution.

Mean and standard deviation

200

Popular programming language designed by statisticians.

R

300

What does LSTM stand for?

Long Short-Term Memory
300

What is it called when you combine individual models together with the purpose of improving the predictive power of the overall model?

Ensembling

300

Who coined the term Deep Learning in 2006?

Geoffrey Hinton

300

How is precision rate calculated?

True Positives / (True Positives + False Positives)

300

What is the range of sigmoid function output?

0 to 1

400

A technique to downsample feature maps, often used in convolutional models.

Pooling

400

Name two types of unsupervised learning algorithms.

Clustering methods, autoencoders, latent variable models, etc.

400

PyTorch was primarily developed by which company?

Meta (Facebook)

400

A theory that states as the number of trials increases, the average of the result will become closer to the expected value.

Law of Large Numbers


400

What are the two main components of a GAN?

Generator and Discriminator

500

A regularization method that works by removing a random selection of units in a network layer for a single gradient step.

Dropout

500

Name three types of kernels of support vector machines (SVM).

Linear, polynomial, radical/radial, sigmoid

500

Who is known as the founding father of convolutional nets?

Yann LeCun

500

A variable that influences both the dependent variable and the independent variable, causing a spurious association.

Confounding variable

500

The logical error of concentrating on the people or things that made it past some selection process and overlooking those that did not.

Survivorship Bias

M
e
n
u