This type of statistic summarizes data but makes no conclusions about a population.
What is descriptive statistics?
The hypothesis that states “no difference” or “no effect.”
What is the null hypothesis?
This type of file is read into python when using pd.read_csv()
What is a csv file?
This type of plot is best for showing the distribution of a numerical variable.
What is a histogram?
The correlation value(s) that indicates the strongest possible linear relationship.(BE CAREFUL)
+ and - 1 (MUST HAVE BOTH)
This type of statistic uses sample data to make conclusions about a population.
What is inferential statistics?
If your p-value is less than 0.05, this is usually the decision you make.
What is reject the null hypothesis?
This function removes rows that contain missing values.
df.dropna() or dropna() as df is the dataset
This plot displays median, quartiles, and outliers.
What is a boxplot?
The regression type used for predicting a binary outcome.
What is logistic regression?
In a study with no researcher intervention, this type of design is used.
What is an observational study?
This test compares the means of 3 or more groups.
What is ANOVA?
This function prints column names, data types, number of non-null entries, and memory usage.
what is .info()
This straight line plot checks whether data follow a normal distribution.
What is a QQ plot?
These values must be removed before correlation because correlation cannot compute with these.
What is missing or NaNs
The variable that is being predicted in a model. (NOT just a letter)
What is the response or dependent variable?
This test is used to evaluate the association between two categorical variables.
What is Chi-square test?
This function gives summary statistics for numeric columns by default.
What is .describe()?
A scatterplot requires that both variables be of this data type.
What is continuous variables?
This metric measures the proportion of variation explained by a regression model.
What is the r-squared coefficient?
This variable type can distort the relationship between the explanatory and response variables if not controlled. We can eliminate issues caused by this variable in a designed experiment with randomization.
What is a confounding variable?
This test is used to compare two proportions.
What is a two-sample z-test?
This function provides frequency counts for each category in a categorical column.
What is .value_counts()?
This argument in sns.scatterplot() is used to color points by category.
What is hue?
This metric penalizes unnecessary predictors and is preferred over regular R² in multiple regression.
What is Adjusted R²?