A data structure not usable as a key in a map/dictionary
What is a list?
A property of a vector that is equal to the square root of the inner product of the vector and itself.
What is the L2 norm?
What is two?
What are clustering and dimensionality reduction?
An operation that gives the sum of the absolute values of each component of a vector, typically used to calculate distances in the city.
What is manhattan norm or what is L1 norm?
A programming language used to create NumPy other than Python.
What is C?
An operation that can be performed on a function to reveal its slope at any point.
What is the derivative or what is differentiation?
What is false.
These two NP approaches are ways to decide how many clusters you want for a particular resolution.
What is agglomerative and divisive.
These are inputs to a machine learning model that is not learned by the model itself.
What are hyperparameters?
A sorting algorithm that runs in O(n^2) time implemented in a class very briefly.
What is insertion sort?
This algorithm follows the direction of greatest decrease in a function.
What is gradient descent?
This can be added to a support vector machine model for data that is not separable.
What is slack?
This O(n^3) algorithm should be used to efficiently and accurately reduce the dimensions of a Euclidean dataset.
What is principal component analysis?
This is occurring if your testing/validation error is much higher than your training error.
What is overfitting?
An attribute that returns the dimensions of a data structure.
What is .shape?
An operation that interchanges rows and columns in a matrix.
What is transpose?
Activation functions must be this in order for neural networks to well-approximate any function.
What is nonlinear.
What is it called when you are trying to map data to a non-linear surface.
What are recurrent neural networks?
A data structure that is implemented when creating lists in Python (hint: used in Java).
What is an ArrayList?
An operation performed on a vector v = (v1, ..., vN) that is equal to (|v1|^p + ... + |vN|^p)^(1/p).
What is Minkowski norm?
Letting X be the matrix whose rows are the datapoints xi, c being a constant, and y being the targets, the expression (X^TX + cI)^{-1}X^Ty is the solution to this algorithm.
What is ridge regression?
Consider one circular red dataset of radius one and one circular blue dataset of radius one, this is the minimum separation distance between the data that ensures Lloyd's algorithm succeedsd.
What is 2.08? Also give points for saying 2.
This is the shape of the receiver-operating characteristic curve for a random classifier given two equal-sized categories (assuming a high amount of data).
What is linear?