Statistics
Machine Learning
Programming
Data Wrangling
Data Gathering
100

The middle value of a dataset

What is a median?

100

In this type of learning, a computer learns by looking at examples with answers, like learning to tell if a photo has a cat or dog

What is supervised learning?

100

This Python library is commonly used for data manipulation and analysis

What is pandas?

100

This term describes missing data values.

What are null values?

100

This method of data gathering involves directly asking people questions to collect their opinions or experiences.

What is a survey?

200

a measure of how dispersed the data is in relation to the mean

What is standard deviation?

200

This type of learning groups things that are similar, like putting all red toys together and all blue toys together

What is clustering?

200

Name the package manager commonly used to install Python libraries

What is pip?

200

The term for combining two datasets side by side, often used to create one larger dataset

What is merging or concatenating?

200

In surveys, this refers to the group of individuals selected to participate and provide data

What is a sample?

300

This statistical test is commonly used to compare the means of two independent groups

What is a t-test?

300

This is when a computer learns too many details from examples and makes mistakes with new things it hasn’t seen

What is overfitting?

300

This function in R returns the structure of an object, giving a quick summary of its contents and data types

What is str()?

300

In data wrangling, this process helps to standardize data by adjusting values to a common scale.

What is normalization?

300

a subset of a statistical population where each member of the population is equally likely to be chosen

What is random sampling?

400

This measures the strength and direction of a linear relationship between two variables.

What is the correlation coefficient?

400

In this type of learning, a computer finds patterns on its own without being told the right answers.

What is unsupervised learning?

400

In Python, this built-in function is used to sort a list of numbers in ascending order

What is sorted()?

400

Name the Python library used to handle large, multi-dimensional arrays and matrices

What is NumPy?

400

This term describes the practice of using existing data from previous studies or databases for new analyses

What is secondary data analysis?

500

Rejecting the null hypothesis when it is actually true

What is Type I error?

500

This type of model learns from past examples and tries to guess something about new examples, like guessing if an email is "spam" or "not spam"

What is a prediction model?

500

In SQL, this clause is used to filter records based on a specified condition.

What is the WHERE clause?

500

This process involves filling in missing values in a dataset to ensure completeness and accuracy before analysis

What is imputation?

500

This term refers to the systematic error introduced in survey results due to the way questions are worded or how the survey is administered

What is response bias?