data science Jeopardy Template

data science basics

machine learning I

machine learning II

evaluation metrics

python essentials

100

This type of plot in data science displays the distribution of a dataset and divides it into quartiles, with a "box" representing the interquartile range and "whiskers" indicating variability outside the upper and lower quartiles.

What is a Box Plot?

100

This type of learning uses labeled datasets to train models.

What is supervised learning?

100

A tree-based algorithm that splits data into smaller groups to make predictions.

What is a decision tree?

100

this metric calculates how often the model makes correct predictions.

what is accuracy?

100

This method removes missing data from a DataFrame.

What is dropna()?

200

This type of plot displays the relationship between two variables along with a fit line.

what is a regression plot?

200

This term refers to a machine learning model that performs exceptionally well on training data but poorly on unseen data.

What is Overfitting?

200

This type of machine learning model combines predictions from multiple models for better results.

What is ensemble learning?

200

This metric measures the average squared difference between predicted and actual values in a regression model.

What is Mean Squared Error (MSE)?

200

The Pandas function to display the first rows of a DataFrame.

What is head()?

300

This process imputes missing data by estimating its value based on other observations.

What is data imputation?

300

This method scales features to a standard normal distribution.

What is standardization?

300

This popular machine learning algorithm is used for both classification and regression tasks and constructs decision trees during its training.

What is Random Forest?

300

The metric that averages evaluation results across multiple folds of the data.

What is cross-validation?

300

The Pandas function that merges two DataFrames based on specific keys.

what is merge() ?

400

This type of data transformation scales features to fit within a specified range, such as [0, 1].

What is normalization?

400

This algorithm generates a hypothesis space by focusing on decision boundaries instead of fitting data points directly.

What is a Support Vector Machine (SVM)?

400

This ensemble model averages multiple models’ predictions to improve robustness and accuracy.

What is Bagging?

400

This metric measures the number of times a binary classifier predicts a positive label when the actual label is negative.

What is False Positive Rate?

400

The Pandas function used to group data by a column and calculate aggregated values.

What is groupby()?

500

This technique in data science involves reducing the number of features in a dataset while preserving as much variability as possible.

What is Principal Component Analysis (PCA)?

500

This algorithm, often used in clustering tasks, assigns data points to the nearest centroid.

What is K-means?

500

This algorithm uses weak learners iteratively to minimize the residual errors of the previous model.

What is gradient boosting?

500

The metric used to evaluate classification models when precision and recall are equally important.

What is F1 score?

500

This method reshapes data from wide to long format.

What is melt()?