This common technique for scaling features adjusts data so that all values fall between 0 and 1, improving the performance of machine learning algorithms sensitive to input scale.
What is normalization (or min-max scaling)?
This algorithm is used for regression tasks and minimizes the sum of squared residuals to find the best-fitting line or plane for the data.
What is linear regression?
This process is used in neural networks to minimize the error by calculating the gradient of the loss function with respect to each weight and adjusting them accordingly.
What is backpropagation?
This hyperparameter in neural networks defines how many data points are passed through the network before updating the model's weights, and smaller values can make the model update more frequently.
What is batch size?
This company’s deep learning research division, DeepMind, developed AlphaGo, the first AI program to defeat a world champion in the game of Go.
What is Google (or Alphabet)?
This process involves replacing missing values in a dataset with substituted values, such as the mean, median, or mode of the column.
What is imputation?
Named after a British statistician, this algorithm makes predictions by combining prior probability and likelihood, assuming strong independence between features.
What is the Naive Bayes classifier?
This function, part of each node in a NN, introduces non-linearity to a neural network model, enabling it to learn complex patterns. Examples are sigmoid or ReLU
What is an activation function?
In gradient descent, this hyperparameter controls the number of passes through the training dataset, which can affect how well the model converges to the optimal solution.
What is the number of epochs?
This company is known for its advanced GPU technology, which is widely used in machine learning training and inference, and also developed the CUDA platform for parallel computing.
What is NVIDIA?
This transformation technique, commonly used for categorical variables, converts each category into a binary column, where a value of 1 indicates the presence of the category.
What is one-hot encoding?
This machine learning technique is used for classification and regression tasks, where it constructs a hyperplane or set of hyperplanes to separate data points into classes.
What is a support vector machine (SVM)?
In convolutional neural networks, this operation reduces the spatial dimensions of the input after convolution, typically taking the maximum or average of the convultion.
What is pooling?
This hyperparameter controls how much the weights of the model are adjusted with respect to the gradient of the loss function, with larger values potentially leading to faster but less stable learning.
What is learning rate?
With more than 7,500 data science and machine learning packages pre-installed, this distribution is one of the most popular tools for data scientists and analysts to start building models right away. It includes a data science package manager similar to PIP.
What is Anaconda?
In imbalanced classification tasks, this method randomly selects a subset of the majority class to match the size of the minority class, preventing the model from being biased towards the majority class.
What is undersampling?
This unsupervised learning algorithm partitions data into clusters by minimizing the sum of squared distances between data points and the centroid of the cluster they are assigned to.
What is K-means clustering?
This class of neural networks is designed to process sequential data, such as time series or text, by keeping track of information from previous inputs.
What is a Recurrent Neural Network (RNN)?
In neural networks, this hyperparameter determines the fraction of neurons that are randomly ignored during training, helping to prevent overfitting by reducing co-dependency between neurons.
What is dropout rate?
This company is known for its widely-used open-source library of natural language processing models, including BERT, GPT, and RoBERTa, and has become a go-to platform for transformer-based models. It has an emoji as its logo.
What is Hugging Face?
In natural language processing, this technique reduces words to their base or root form (e.g. running -> run)
What is stemming?
This technique is used in decision trees to reduce overfitting by removing branches that have little impact on the prediction accuracy, either during or after the tree has been fully grown.
What is pruning?
This property of neural networks, wherein very deep networks struggle to propagate gradients back through layers during training, can lead to little or no weight updates.
What is the vanishing gradient problem?
In convolutional neural networks (CNNs), this hyperparameter defines the size of the window that slides over the input data to detect features, impacting the level of detail the model can capture.
What is kernel size (or filter size)?
This robotics company is famous for developing highly advanced robots, including the agile, dog-like robot named Spot, which has been used in industries like construction and security.
What is Boston Dynamics?