This is the process of filling missing values in a dataset.
What is imputation?
This chart type is best for showing the distribution of a single numeric variable.
What is a histogram?
This is the most frequently occurring value in a dataset.
What is the mode?
This type of learning involved labeled data.
What is supervised learning?
This term describes firsthand data collected for a specific purpose.
What is primary data?
This term describes values in a dataset that are very different from the rest.
What are outliers?
This type of plot shows the relationship between two numeric variables.
What is a scatterplot?
This statistical measure describes the average spread of data around the mean.
What is the standard deviation?
This algorithm is used to group clusters of data together by calculating their distance from each other. This is often used for classification problems.
What is K-Nearest Neighbors (KNN)?
This visualization is often used to show trends in sales or revenue over time.
What is a time series plot?
A dataset has repeated rows. You would use this Pandas command to remove duplicates.
What is drop_duplicates()?
This chart type visualizes proportions of a whole.
What is a pie chart?
A dataset with a "bell-shaped" curve is described as having this type of distribution.
What is a normal distribution?
What is R-Squared?
This process is used to extract, clean, and store data for analysis.
What is ETL (Extract, transform, load)?
This metric is often used to detect data entry errors in numeric columns. It also provides further knowledge of numeric columns in a dataset.
What is the mean or standard deviation?
This library in Python is commonly used for creating static, interactive, and animated visualizations.
What is Matplotlib or Seaborn?
What is hypothesis testing?
This practice evaluates model performance by dividing data into multiple sets.
What is training, testing, and validation?
This type of analysis is used to determine key factors in customer behavior.
What is logistic regression?
When merging two datasets, this type of join is used to keep only the rows that match in both datasets.
What is an inner join?
This chart type is typically used to visualize time series data.
What is a line chart?
What is A/B testing?
This phenomenon occurs when a trends appears in groups, but when the groups are combined, the trend reverses.
What is Simpson's Paradox?
This predictive modeling technique is often used in healthcare to classify diseases based on patient data?
What is a decision tree or random forest?