ML Basics
Statistics & Probability
ML Advanced
Algorithms & Logic
???
100
A model of the form (y = mx + b) that is used for  predicting simple relationships.

What is a linear model or linear regression?

100

A symmetric, bell shaped distribution that appears in many natural, especially random processes.

What is the normal distribution?

100

A function in a neural network that introduces non-linearity, allowing the model to learn more complex relationships

What is an activation function?

100

A searching method/algorithm that can find a target from any monotonic answer function by repeatedly cutting the space in half.

What is binary search?

100

An activation function that looks like a sideways s and also starts with the letter s

What is the sigmoid function?

200

The settings used to tune a model before training (not the model weights).

What are hyperparameters?

200

A value that describes the relationship between two numbers, and its usually between -1 and 1.

What is the correlation coefficient (r)?

200

This algorithm uses the chain rule to update the weights of a neural network after gradient descent.

What is backpropagation?

200

An algorithmic method that solves a problem by repeatedly calling itself on smaller versions of the same problem until reaching a base case.

What is recursion?

200

A value that determines how big of a step we take down the gradient of the loss function during gradient descent.

What is the learning rate?

300

A method of model training which involves an agent's trained to maximize reward and minimize punishment.

What is Reinforcement Learning?

300

A value that describes how spread out the data is from the average.

What is standard deviation?

300

A type of activation function that takes a set of N data points and returns a normalized (0 - 1) probability distribution with N possible outcomes.

What is softmax?

300

An algorithmic technique that makes a locally optimal choice at each step in hopes of landing on a global solution.

What is a greedy algorithm?

300

The activation function that is most often used along with cross-entropy loss for multi-class classification

What is softmax?

400

A model that is inspired from the brain that extends simple linear models and is the foundation of deep learning

What is a neural network?

400

A quantity that describes how much 2 variables vary together but is not scaled between -1 and 1.

What is covariance?

400

An iterative method that aims to converge on the optimal model weights with the least error, and is used for complex models with many weights (when a closed form solution isn't possible)

What is gradient descent?

400

An algorithm that explores all reachable notes from a starting node, layer by layer, in a graph/tree.

What is breadth-first-search (BFS)

400

A method that randomly turns off some neurons in a neural network during training to prevent overfitting.

What is dropout?

500

A function that quantifies how different a model's predictions are compared to the true values.

What is the loss/error function?

500

A dimensionality reduction techniques that looks at the directions in the data that have the most variance, and then represents the data along these axes.

What is Principal Component Analysis (PCA)

500

A problem that occurs when the gradients are extremely small or large in a deep neural network, resulting in unstable and inefficient learning.

What is the vanishing gradient problem?

500

An algorithm that finds the shortest path from one node to all others in a weighted graph with nonnegative weights.

What is Dijkstra's algorithm?

500

A technique that adds a small penalty to large weights to reduce overfitting.

What is (L1 or L2) regularization