This type of learning trains models on labeled data where inputs come with corresponding output labels.
Supervised Learning
A supervised learning technique that categorizes data into predefined classes or categories, used in medical diagnosis and fraud detection.
Classification
This simplest form involves classifying data into one of two possible categories
Binary Classification
A tree-like structure where each internal node is a decision based on a feature, and leaves are class labels; e.g., classifying animals by size/habitat.
decision trees
Table showing TP, TN, FP, FN to define classification performance.
Confusion Matrix
In this type of learning, the algorithm works with input data that does not have any associated output or target value.
Unsupervised Learning
The primary goal of classification
Predict the class label of new, unseen data
This type has more than two classes, where each instance is assigned to one class
Multiclass classification
This statistical method models the probability of a binary outcome (0 or 1), e.g., predicting email spam or disease presence.
logistic regression
Formula: (TP + TN) / Total Instances; proportion of correctly classified instances.
Accuracy
This supervised learning task predicts discrete labels or categories, like spam or not spam in emails.
Classification
In the training phase, the model learns the relationship between __________ and __________ on a labeled dataset.
input features and class labels
In this type, instances can belong to multiple classes at once
Multilabel classification
This model finds the optimal hyperplane to separate classes in feature space, e.g., classifying images of dogs/cats/birds.
support vector machine (SVM)
TP / (TP + FP); proportion of true positives among all positive predictions; key when false positives are costly (e.g., spam).
Precision
This unsupervised task groups similar data points together based on certain features, like customer segments by purchasing behavior.
Clustering
After training, the model uses learned patterns to ___________ the class label for new, unseen data.
Predict
This type has classes with a natural order or ranking
Ordinal classification
Non-parametric method that classifies based on the majority class of its K nearest neighbors, e.g., classifying flowers or recommending products.
K-nearest neighbors (KNN)
TP / (TP + FN); proportion of true positives correctly identified; crucial when false negatives are costly (e.g., disease detection).
Recall (sensitivity)
This unsupervised technique reduces the number of features in a dataset while retaining important information, e.g., through PCA.
Dimensionality Reduction
Key difference between supervised vs. unsupervised
Key difference: Supervised uses labeled input-output pairs; unsupervised uses unlabeled (only input data).
3 or more classes, predicts one of several possible classes (not multiple)
Multiclass classification
Probabilistic classifier based on Bayes' Theorem, assumes features are conditionally independent; great for text like document genres.
Naïve Bayes
Harmonic mean of precision and recall: 2 × (Precision × Recall) / (Precision + Recall); balances both in imbalanced classes.
F1 score