Linear & Polynomial Regression
Classification (Logistic Regression & Naive Bayes)
KNN & Clustering (K-means, DBSCAN)
Model Performance
Dimensionality Reduction (PCA & LDA)
100

The slope coefficient in a regression line represents:
a) The intercept
b) The rate of change in Y per unit change in X
c) The average of X and Y
d) The residual value

b) The rate of change in Y per unit change in X

100

Logistic regression predicts:
a) Continuous values
b) Probabilities between 0 and 1
c) Slopes and intercepts
d) Residuals

b) Probabilities between 0 and 1

100

Which of the following distance measures cannot be used in KNN? 

a) Cosine 

b) Euclidean 

c) Manhattan 

d) Minkowski 

e) Any of the above can be used


e) Any of the above can be used

100

If a model performs well on training data but poorly on test data, it’s:
a) Underfit
b) Overfit
c) Optimized
d) Balanced

b) Overfit

100

The main purpose of PCA is to:
a) Increase data size
b) Reduce dimensionality while keeping maximum variance
c) Cluster data
d) Remove dependent features

b) Reduce dimensionality while keeping maximum variance

200

Simple linear regression uses a line to approximate the relationship between which of the following?

a. Coefficients and dependent variables 

b. Independent variables and residuals 

c. Independent and dependent variables 

d. None of these



c. Independent and dependent variables

200

A Naive Bayes classifier assumes that all features are:
a) Dependent
b) Independent
c) Continuous
d) Nonlinear

b) Independent

200

K-means tries to minimize:
a) Inter-cluster distance
b) Sum of squared distances within clusters
c) The number of clusters
d) The model’s bias

b) Sum of squared distances within clusters

200

When a model is too simple and fails to capture patterns, it is:
a) Regularized
b) Overfit
c) Underfit
d) Optimized

c) Underfit

200

LDA differs from PCA because LDA:
a) Ignores class labels
b) Uses labels to maximize class separation
c) Minimizes variance
d) Removes all correlated features

b) Uses labels to maximize class separation

300

Simple linear regression shows the relationship between:

a) Independent and dependent variables

b) Coefficients and residuals

c) Predicted and actual features

d) Two dependent variables

a) Independent and dependent variables

300

Which of the following describes one advantage of Logistic Regression over Linear Regression?

a) Logistic Regression is less computationally complex than Linear Regression  

b) Logistic Regression has better performance on continuous data than Linear Regression 

c) Logistic Regression is less sensitive to outlier data than Linear Regression

d) Logistic Regression is more sensitive to outlier data than Linear Regression  


c) Logistic Regression is less sensitive to outlier data than Linear Regression

300

In DBSCAN, epsilon (eps) defines:
a) The radius of a point’s neighborhood
b) The number of clusters
c) The total number of data points
d) The minimum distance between clusters

a) The radius of a point’s neighborhood

300

Lasso regression differs from Ridge regression because: 

a) Lasso can eliminate features by setting coefficients to zero
b) Lasso cannot handle linear data
c) Ridge regression doesn’t penalize coefficients
d) Lasso increases model variance

a) Lasso can eliminate features by setting coefficients to zero

300

PCA is an ________ technique, while LDA is a ________ technique.
a) Supervised, Unsupervised
b) Unsupervised, Supervised
c) Regression, Classification
d) Linear, Nonlinear

b) Unsupervised, Supervised

400

Polynomial regression with a very high degree often leads to:

a) Better generalization
b) More bias
c) Overfitting
d) Simpler models  

c) Overfitting

400

Bayes’ Theorem helps ML models compute:
a) Residual variance
b) Conditional probabilities
c) Regression coefficients
d) Distance metrics

b) Conditional probabilities

400

Why should training inputs be scaled (standardized and normalized) when using KNN? 

a) The inputs do not need to be scaled or normalized for KNN 

b) Because KNN is a density-based algorithm 

c) To prevent features with larger scales from dominating the distance metric 

d) To prevent overfitting if the inputs are not scaled and normalized 

c) To prevent features with larger scales from dominating the distance metric

400

Of these combinations of train and test scores, which would represent the closest to an overfit model? 

a) Train: 0.78, Test: 0.59 

b) Train: 0.67, Test: 0.65 

c) Train: 0.59, Test: 0.78 

d) Train: 0.61, Test: 0.61 

a) Train: 0.78, Test: 0.59

400

What do the eigenvectors represent in PCA?

a) The covariance of the features along the diagonal 

b) The amount of variance attached to each PC

c)The direction of the one PC with the most variance 

d)The directions of the new principal axes 

d)The directions of the new principal axes

500

What does a high R-squared value indicate?

a) The model fits the data well
b) The model is overfit
c) There is no correlation
d) The slope is near zero  

a) The model fits the data well

500

Logistic Regression cannot use a residual calculation (distance from data point to a model classification boundary) due to which of the following?

a) The errors are too large for the sigmoid function 

b) The translation of data from sigmoid to linear results in values of +/- infinity  

c) The mean squared error results in divide by 0 values from the sigmoid 

d) Residuals cannot be defined for categorical outcomes  

b) The translation of data from sigmoid to linear results in values of +/- infinity  

500

Which of the following best describes how a cluster is formed in DBSCAN? 

a) By iteratively minimizing the distances between points and their cluster centers 

b) By connecting dense regions of points based on the epsilon (eps) neighborhood and the minimum points criteria 

c) By assigning every data point to the nearest cluster 

d) Minimize the number of noise points in a cluster by removing any point that is within its neighborhood 

b) By connecting dense regions of points based on the epsilon (eps) neighborhood and the minimum points criteria

500

Increasing the regularization parameter (λ) in Lasso regression generally:
a) Decreases bias
b) Increases variance
c) Increases bias, reduces variance
d) Has no effect

c) Increases bias, reduces variance

500

Which of the following is the best definition of 'within-class scatter' in LDA?

a. Distance perpendicular to the dataset class categories center and the overall dataset center 

b. Distance along the axis of maximum variation between all data in the dataset 

c. Distances of each sample in the dataset class to the mean of the same dataset class 

d. None of these 

e. Distances from each of the dataset class categories center to the overall dataset center 

c. Distances of each sample in the dataset class to the mean of the same dataset class

M
e
n
u