This type of machine learning uses labeled data to predict an output.
What is supervised learning?
This function in Python is used to read data from a CSV file.
What is read_csv()?
Removing or filling in missing values is part of this preprocessing step.
What is data cleaning?
This metric is a single value used to describe the difference between predicted and actual values in regression.
What is mean squared error (MSE)?
In linear regression, this term refers to the degree to which the independent variables explain the variation in the dependent variable.
What is the coefficient of determination (R-squared)?
This learning technique groups data into clusters without labeled data.
What is unsupervised learning?
This Python data structure stores key-value pairs.
What is a dictionary?
This technique reduces the number of features in a dataset while preserving its variance.
What is dimensionality reduction?
This matrix summarizes the performance of a classification model by showing true positives, true negatives, false positives, and false negatives.
What is a confusion matrix?
This type of plot is used to show the correlation between two continuous variables.
What is a scatter plot?
This type of machine learning involves an agent taking actions to maximize cumulative reward.
What is reinforcement learning?
The Python keyword used to get summary statistics of numeric variables.
What is describe()?
This process involves splitting data into training and testing sets to evaluate a model’s performance.
What is cross-validation?
This metric measures a model’s ability to correctly identify positive instances among all actual positives.
What is recall?
This method involves creating new uncorrelated features by combining original features.
What is Principal Component Analysis (PCA)?
In supervised learning, this is the term for the variable you want to predict.
What is the target variable?
This function returns the number of items in a list.
What is len()?
A high correlation between two features might indicate this, which can affect a model’s performance.
What is multicollinearity?
A measure that combines precision and recall, calculated as the harmonic mean of the two.
What is F1 score?
The tradeoff between bias and variance helps to balance this in a model.
What is model complexity?
This problem occurs when a model performs well on training data but poorly on new data.
What is overfitting?
This function is used to combine a list of strings into a single string with a specified separator.
What is join()?
This technique standardizes data by transforming each feature to have a mean of 0 and a standard deviation of 1.
What is normalization or standardization?
This curve plots the true positive rate against the false positive rate, providing a visual measure of a classifier's performance.
What is the ROC curve?
This metric, often used in binary classification, measures the probability that a randomly chosen positive instance is ranked higher than a randomly chosen negative instance.
What is the area under the ROC curve (AUROC)?