Machine learning Jeopardy

Data Preprocessing

Classical ML

Neural Networks

Hyperparameters

Industry

100

This common technique for scaling features adjusts data so that all values fall between 0 and 1, improving the performance of machine learning algorithms sensitive to input scale.

What is normalization (or min-max scaling)?

100

This algorithm is used for regression tasks and minimizes the sum of squared residuals to find the best-fitting line or plane for the data.

What is linear regression?

100

This process is used in neural networks to minimize the error by calculating the gradient of the loss function with respect to each weight and adjusting them accordingly.

What is backpropagation?

100

This hyperparameter in neural networks defines how many data points are passed through the network before updating the model's weights, and smaller values can make the model update more frequently.

What is batch size?

100

This company’s deep learning research division, DeepMind, developed AlphaGo, the first AI program to defeat a world champion in the game of Go.

What is Google (or Alphabet)?

200

This process involves replacing missing values in a dataset with substituted values, such as the mean, median, or mode of the column.

What is imputation?

200

Named after a British statistician, this algorithm makes predictions by combining prior probability and likelihood, assuming strong independence between features.

What is the Naive Bayes classifier?

200

This function, part of each node in a NN, introduces non-linearity to a neural network model, enabling it to learn complex patterns. Examples are sigmoid or ReLU

What is an activation function?

200

In gradient descent, this hyperparameter controls the number of passes through the training dataset, which can affect how well the model converges to the optimal solution.

What is the number of epochs?

200

This company is known for its advanced GPU technology, which is widely used in machine learning training and inference, and also developed the CUDA platform for parallel computing.

What is NVIDIA?

300

This transformation technique, commonly used for categorical variables, converts each category into a binary column, where a value of 1 indicates the presence of the category.

What is one-hot encoding?

300

This machine learning technique is used for classification and regression tasks, where it constructs a hyperplane or set of hyperplanes to separate data points into classes.

What is a support vector machine (SVM)?

300

In convolutional neural networks, this operation reduces the spatial dimensions of the input after convolution, typically taking the maximum or average of the convultion.

What is pooling?

300

This hyperparameter controls how much the weights of the model are adjusted with respect to the gradient of the loss function, with larger values potentially leading to faster but less stable learning.

What is learning rate?

300

With more than 7,500 data science and machine learning packages pre-installed, this distribution is one of the most popular tools for data scientists and analysts to start building models right away. It includes a data science package manager similar to PIP.

What is Anaconda?

400

In imbalanced classification tasks, this method randomly selects a subset of the majority class to match the size of the minority class, preventing the model from being biased towards the majority class.

What is undersampling?

400

This unsupervised learning algorithm partitions data into clusters by minimizing the sum of squared distances between data points and the centroid of the cluster they are assigned to.

What is K-means clustering?

400

This class of neural networks is designed to process sequential data, such as time series or text, by keeping track of information from previous inputs.

What is a Recurrent Neural Network (RNN)?

400

In neural networks, this hyperparameter determines the fraction of neurons that are randomly ignored during training, helping to prevent overfitting by reducing co-dependency between neurons.

What is dropout rate?

400

This company is known for its widely-used open-source library of natural language processing models, including BERT, GPT, and RoBERTa, and has become a go-to platform for transformer-based models. It has an emoji as its logo.

What is Hugging Face?

500

In natural language processing, this technique reduces words to their base or root form (e.g. running -> run)

What is stemming?

500

This technique is used in decision trees to reduce overfitting by removing branches that have little impact on the prediction accuracy, either during or after the tree has been fully grown.

What is pruning?

500

This property of neural networks, wherein very deep networks struggle to propagate gradients back through layers during training, can lead to little or no weight updates.

What is the vanishing gradient problem?

500

In convolutional neural networks (CNNs), this hyperparameter defines the size of the window that slides over the input data to detect features, impacting the level of detail the model can capture.

What is kernel size (or filter size)?

500

This robotics company is famous for developing highly advanced robots, including the agile, dog-like robot named Spot, which has been used in industries like construction and security.

What is Boston Dynamics?