A tuning parameter in an optimization algorithm that determines the step size at each iteration while moving toward a minimum of a loss function.
Learning Rate
Producing a model with poor predictive ability because the model has not captured the complexity of the training data.
Underfitting
A very famous dataset containing 60,000 images of handwritten digits.
MNIST dataset
In statistics, values that are significantly different from the majority of the data points are called?
Outliers (They can be either significantly higher or lower than the rest of the data)
What is the output of a softmax function?
A vector of probabilities, adding up to 1 (A probability distribution over a set of K possible outcomes)
A full training pass over the entire dataset such that each example has been seen once is called what?
Epoch
In binary classification, a 2x2 grid that has four parts, the number of true positives, false positives, true negatives, and false negatives, showing comparisons vs actual results.
A confusion matrix
What was the name of the first computer that beat the world champion at chess?
Deep Blue (Famously defeated world chess champion Garry Kasparov in 1997)
The two parameters of a normal distribution.
- Mean (μ): This represents the average or central tendency of the distribution. It determines the peak of the bell curve.
- Standard Deviation (σ): This measures the spread or dispersion of the data around the mean. It determines the width of the bell curve. A higher standard deviation indicates a wider spread, while a lower standard deviation indicates a narrower spread.
A popular programming language designed by statisticians.
R
What does LSTM stand for?
Long Short-Term Memory
A technique of combining individual models together with the purpose of improving the predictive power of the overall model.
Ensemble Learning
Who coined the term Deep Learning in 2006?
Geoffrey Hinton
How is precision rate calculated?
Precision = True Positives / (True Positives + False Positives)
What is the range of sigmoid function output?
0 to 1
A technique to downsample feature maps, often used in convolutional models.
Pooling
Clustering methods, autoencoders, latent variable models.
These are all techniques used in Unsupervised Learning
PyTorch was primarily developed by which company?
Facebook AI Research (FAIR).
In probability and statistics, a theory that states as the number of trials increases, the average of the result will become closer to the expected value.
The Law of Large Numbers
What are the two main components of a GAN?
Generator: This neural network takes random noise as input and generates new data samples. It aims to create data that is indistinguishable from real data.
Discriminator: This neural network takes either real data or generated data as input and classifies it as real or fake.It acts as a critic, evaluating the quality of the generated data.
A regularization method that works by removing a random selection of units in a network layer for a single gradient step.
Dropout
Linear Kernel, Polynomial Kernel, Radial Basis Function (RBF) Kernel.
Three types of kernels of support vector machines (SVM)
Who is known as the founding father of convolutional nets?
Yann LeCun
In causal inference, a variable that influences both the dependent variable and the independent variable, causing a spurious association.
Confounding variable
In research, the logical error of concentrating on the people or things that made it past some selection process and overlooking those that did not.
Survivorship Bias (Occurs when researchers focus on individuals, groups, or cases that have passed some sort of selection process while ignoring those who did not)