Data Cleaning
Data Visualization
Statistics Basics
Machine Learning
Real-World Applications
100

This is the process of filling missing values in a dataset.

What is imputation?

100

This chart type is best for showing the distribution of a single numeric variable.

What is a histogram?

100

This is the most frequently occurring value in a dataset.

What is the mode?

100

This type of learning involved labeled data.

What is supervised learning?

100

This term describes firsthand data collected for a specific purpose.

What is primary data?

200

This term describes values in a dataset that are very different from the rest. 

What are outliers?

200

This type of plot shows the relationship between two numeric variables. 

What is a scatterplot?

200

This statistical measure describes the average spread of data around the mean. 

What is the standard deviation?

200

This algorithm is used to group clusters of data together by calculating their distance from each other. This is often used for classification problems.

What is K-Nearest Neighbors (KNN)?

200

This visualization is often used to show trends in sales or revenue over time.

What is a time series plot?

300

A dataset has repeated rows. You would use this Pandas command to remove duplicates. 

What is drop_duplicates()?

300

This chart type visualizes proportions of a whole. 

What is a pie chart?

300

A dataset with a "bell-shaped" curve is described as having this type of distribution.

What is a normal distribution?

300
When making a regression model, this value measures the variance of the dependent variable that is explained by the model. 

What is R-Squared?

300

This process is used to extract, clean, and store data for analysis. 

What is ETL (Extract, transform, load)?

400

This metric is often used to detect data entry errors in numeric columns. It also provides further knowledge of numeric columns in a dataset. 

What is the mean or standard deviation?

400

This library in Python is commonly used for creating static, interactive, and animated visualizations. 

What is Matplotlib or Seaborn?

400
This method is used to statistically determine if data supports a claim. 

What is hypothesis testing?

400

This practice evaluates model performance by dividing data into multiple sets. 

What is training, testing, and validation?

400

This type of analysis is used to determine key factors in customer behavior.

What is logistic regression?

500

When merging two datasets, this type of join is used to keep only the rows that match in both datasets. 

What is an inner join?

500

This chart type is typically used to visualize time series data. 

What is a line chart?

500
This statistical method compares two versions of a product to determine the better performer. 

What is A/B testing?

500

This phenomenon occurs when a trends appears in groups, but when the groups are combined, the trend reverses.

What is Simpson's Paradox?

500

This predictive modeling technique is often used in healthcare to classify diseases based on patient data?

What is a decision tree or random forest?