The middle value of a dataset
What is a median?
In this type of learning, a computer learns by looking at examples with answers, like learning to tell if a photo has a cat or dog
What is supervised learning?
This Python library is commonly used for data manipulation and analysis
What is pandas?
This term describes missing data values.
What are null values?
This method of data gathering involves directly asking people questions to collect their opinions or experiences.
What is a survey?
a measure of how dispersed the data is in relation to the mean
What is standard deviation?
This type of learning groups things that are similar, like putting all red toys together and all blue toys together
What is clustering?
Name the package manager commonly used to install Python libraries
What is pip?
The term for combining two datasets side by side, often used to create one larger dataset
What is merging or concatenating?
In surveys, this refers to the group of individuals selected to participate and provide data
What is a sample?
This statistical test is commonly used to compare the means of two independent groups
What is a t-test?
This is when a computer learns too many details from examples and makes mistakes with new things it hasn’t seen
What is overfitting?
This function in R returns the structure of an object, giving a quick summary of its contents and data types
What is str()?
In data wrangling, this process helps to standardize data by adjusting values to a common scale.
What is normalization?
a subset of a statistical population where each member of the population is equally likely to be chosen
What is random sampling?
This measures the strength and direction of a linear relationship between two variables.
What is the correlation coefficient?
In this type of learning, a computer finds patterns on its own without being told the right answers.
What is unsupervised learning?
In Python, this built-in function is used to sort a list of numbers in ascending order
What is sorted()?
Name the Python library used to handle large, multi-dimensional arrays and matrices
What is NumPy?
This term describes the practice of using existing data from previous studies or databases for new analyses
What is secondary data analysis?
Rejecting the null hypothesis when it is actually true
What is Type I error?
This type of model learns from past examples and tries to guess something about new examples, like guessing if an email is "spam" or "not spam"
What is a prediction model?
In SQL, this clause is used to filter records based on a specified condition.
What is the WHERE clause?
This process involves filling in missing values in a dataset to ensure completeness and accuracy before analysis
What is imputation?
This term refers to the systematic error introduced in survey results due to the way questions are worded or how the survey is administered
What is response bias?