This type of plot in data science displays the distribution of a dataset and divides it into quartiles, with a "box" representing the interquartile range and "whiskers" indicating variability outside the upper and lower quartiles.
What is a Box Plot?
This type of learning uses labeled datasets to train models.
What is supervised learning?
A tree-based algorithm that splits data into smaller groups to make predictions.
What is a decision tree?
this metric calculates how often the model makes correct predictions.
what is accuracy?
This method removes missing data from a DataFrame.
What is dropna()?
This type of plot displays the relationship between two variables along with a fit line.
what is a regression plot?
This term refers to a machine learning model that performs exceptionally well on training data but poorly on unseen data.
What is Overfitting?
This type of machine learning model combines predictions from multiple models for better results.
What is ensemble learning?
This metric measures the average squared difference between predicted and actual values in a regression model.
What is Mean Squared Error (MSE)?
The Pandas function to display the first rows of a DataFrame.
What is head()?
This process imputes missing data by estimating its value based on other observations.
What is data imputation?
This method scales features to a standard normal distribution.
What is standardization?
This popular machine learning algorithm is used for both classification and regression tasks and constructs decision trees during its training.
What is Random Forest?
The metric that averages evaluation results across multiple folds of the data.
What is cross-validation?
The Pandas function that merges two DataFrames based on specific keys.
what is merge() ?
This type of data transformation scales features to fit within a specified range, such as [0, 1].
What is normalization?
This algorithm generates a hypothesis space by focusing on decision boundaries instead of fitting data points directly.
What is a Support Vector Machine (SVM)?
This ensemble model averages multiple models’ predictions to improve robustness and accuracy.
What is Bagging?
This metric measures the number of times a binary classifier predicts a positive label when the actual label is negative.
What is False Positive Rate?
The Pandas function used to group data by a column and calculate aggregated values.
What is groupby()?
This technique in data science involves reducing the number of features in a dataset while preserving as much variability as possible.
What is Principal Component Analysis (PCA)?
This algorithm, often used in clustering tasks, assigns data points to the nearest centroid.
What is K-means?
This algorithm uses weak learners iteratively to minimize the residual errors of the previous model.
What is gradient boosting?
The metric used to evaluate classification models when precision and recall are equally important.
What is F1 score?
This method reshapes data from wide to long format.
What is melt()?