Classification

Categories of Learning

Classification Basics

Types of Classification

Classification Techniques

Evaluation Metrics

100

This type of learning trains models on labeled data where inputs come with corresponding output labels.

Supervised Learning

100

A supervised learning technique that categorizes data into predefined classes or categories, used in medical diagnosis and fraud detection.

Classification

100

This simplest form involves classifying data into one of two possible categories

Binary Classification

100

A tree-like structure where each internal node is a decision based on a feature, and leaves are class labels; e.g., classifying animals by size/habitat.

decision trees

100

Table showing TP, TN, FP, FN to define classification performance.

Confusion Matrix

200

In this type of learning, the algorithm works with input data that does not have any associated output or target value.

Unsupervised Learning

200

The primary goal of classification

Predict the class label of new, unseen data

200

This type has more than two classes, where each instance is assigned to one class

Multiclass classification

200

This statistical method models the probability of a binary outcome (0 or 1), e.g., predicting email spam or disease presence.

logistic regression

200

Formula: (TP + TN) / Total Instances; proportion of correctly classified instances.

Accuracy

300

This supervised learning task predicts discrete labels or categories, like spam or not spam in emails.

Classification

300

In the training phase, the model learns the relationship between __________ and __________ on a labeled dataset.

input features and class labels

300

In this type, instances can belong to multiple classes at once

Multilabel classification

300

This model finds the optimal hyperplane to separate classes in feature space, e.g., classifying images of dogs/cats/birds.

support vector machine (SVM)

300

TP / (TP + FP); proportion of true positives among all positive predictions; key when false positives are costly (e.g., spam).

Precision

400

This unsupervised task groups similar data points together based on certain features, like customer segments by purchasing behavior.

Clustering

400

After training, the model uses learned patterns to ___________ the class label for new, unseen data.

Predict

400

This type has classes with a natural order or ranking

Ordinal classification

400

Non-parametric method that classifies based on the majority class of its K nearest neighbors, e.g., classifying flowers or recommending products.

K-nearest neighbors (KNN)

400

TP / (TP + FN); proportion of true positives correctly identified; crucial when false negatives are costly (e.g., disease detection).

Recall (sensitivity)

500

This unsupervised technique reduces the number of features in a dataset while retaining important information, e.g., through PCA.

Dimensionality Reduction

500

Key difference between supervised vs. unsupervised

Key difference: Supervised uses labeled input-output pairs; unsupervised uses unlabeled (only input data).

500

3 or more classes, predicts one of several possible classes (not multiple)

Multiclass classification

500

Probabilistic classifier based on Bayes' Theorem, assumes features are conditionally independent; great for text like document genres.

Naïve Bayes

500

Harmonic mean of precision and recall: 2 × (Precision × Recall) / (Precision + Recall); balances both in imbalanced classes.

F1 score