Machine Learning Jeopardy

Python

Mathematics

Supervised

Unsupervised

Misc.

100

A data structure not usable as a key in a map/dictionary

What is a list?

100

A property of a vector that is equal to the square root of the inner product of the vector and itself.

What is the L2 norm?

100

What is the minimum number of categories that can have a non-zero Bayes error?

What is two?

100

Name two subcategories of unsupervised machine learning.

What are clustering and dimensionality reduction?

100

An operation that gives the sum of the absolute values of each component of a vector, typically used to calculate distances in the city.

What is manhattan norm or what is L1 norm?

200

A programming language used to create NumPy other than Python.

What is C?

200

An operation that can be performed on a function to reveal its slope at any point.

What is the derivative or what is differentiation?

200

The perceptron algorithm is guaranteed to terminate for any separable dataset.

What is false.

200

These two NP approaches are ways to decide how many clusters you want for a particular resolution.

What is agglomerative and divisive.

200

These are inputs to a machine learning model that is not learned by the model itself.

What are hyperparameters?

300

A sorting algorithm that runs in O(n^2) time implemented in a class very briefly.

What is insertion sort?

300

This algorithm follows the direction of greatest decrease in a function.

What is gradient descent?

300

This can be added to a support vector machine model for data that is not separable.

What is slack?

300

This O(n^3) algorithm should be used to efficiently and accurately reduce the dimensions of a Euclidean dataset.

What is principal component analysis?

300

This is occurring if your testing/validation error is much higher than your training error.

What is overfitting?

400

An attribute that returns the dimensions of a data structure.

What is .shape?

400

An operation that interchanges rows and columns in a matrix.

What is transpose?

400

Activation functions must be this in order for neural networks to well-approximate any function.

What is nonlinear.

400

What is it called when you are trying to map data to a non-linear surface.

What is manifold learning?

400

Some examples of this category include LSTMs, GRUs, BiNNs.

What are recurrent neural networks?

500

A data structure that is implemented when creating lists in Python (hint: used in Java).

What is an ArrayList?

500

An operation performed on a vector v = (v1, ..., vN) that is equal to (|v1|^p + ... + |vN|^p)^(1/p).

What is Minkowski norm?

500

Letting X be the matrix whose rows are the datapoints xi, c being a constant, and y being the targets, the expression (X^TX + cI)^{-1}X^Ty is the solution to this algorithm.

What is ridge regression?

500

Consider one circular red dataset of radius one and one circular blue dataset of radius one, this is the minimum separation distance between the data that ensures Lloyd's algorithm succeedsd.

What is 2.08? Also give points for saying 2.

500

This is the shape of the receiver-operating characteristic curve for a random classifier given two equal-sized categories (assuming a high amount of data).

What is linear?