Machine Learning Basics
Python Programming
Data Preprocessing
Model Evaluation
Advanced Concepts
100

This type of machine learning uses labeled data to predict an output.

What is supervised learning?

100

This function in Python is used to read data from a CSV file.

What is read_csv()?

100

Removing or filling in missing values is part of this preprocessing step.

What is data cleaning?

100

This metric is a single value used to describe the difference between predicted and actual values in regression.

What is mean squared error (MSE)?

100

In linear regression, this term refers to the degree to which the independent variables explain the variation in the dependent variable.

What is the coefficient of determination (R-squared)?

200

This learning technique groups data into clusters without labeled data.

What is unsupervised learning?

200

This Python data structure stores key-value pairs.

What is a dictionary?

200

This technique reduces the number of features in a dataset while preserving its variance.

What is dimensionality reduction?

200

This matrix summarizes the performance of a classification model by showing true positives, true negatives, false positives, and false negatives.

What is a confusion matrix?

200

This type of plot is used to show the correlation between two continuous variables.

What is a scatter plot?

300

This type of machine learning involves an agent taking actions to maximize cumulative reward.

What is reinforcement learning?

300

The Python keyword used to get summary statistics of numeric variables. 

What is describe()?

300

This process involves splitting data into training and testing sets to evaluate a model’s performance.

What is cross-validation?

300

This metric measures a model’s ability to correctly identify positive instances among all actual positives.

What is recall?

300

This method involves creating new uncorrelated features by combining original features.

What is Principal Component Analysis (PCA)?

400

In supervised learning, this is the term for the variable you want to predict.

What is the target variable?

400

This function returns the number of items in a list.


What is len()?

400

A high correlation between two features might indicate this, which can affect a model’s performance.

What is multicollinearity?

400

A measure that combines precision and recall, calculated as the harmonic mean of the two.

What is F1 score?

400

The tradeoff between bias and variance helps to balance this in a model.

What is model complexity?

500

This problem occurs when a model performs well on training data but poorly on new data.

What is overfitting?

500

This function is used to combine a list of strings into a single string with a specified separator.

What is join()?

500

This technique standardizes data by transforming each feature to have a mean of 0 and a standard deviation of 1.

What is normalization or standardization?

500

This curve plots the true positive rate against the false positive rate, providing a visual measure of a classifier's performance.

What is the ROC curve?

500

This metric, often used in binary classification, measures the probability that a randomly chosen positive instance is ranked higher than a randomly chosen negative instance.

What is the area under the ROC curve (AUROC)?