Data Types & Foundations
Hypotheses & Statistical Tests
Snakes on a DataFrame
See the graph be the graph
Modeling but not that kind
100

This type of statistic summarizes data but makes no conclusions about a population.

What is descriptive statistics?

100

The hypothesis that states “no difference” or “no effect.”

What is the null hypothesis?

100

This type of file is read into python when using pd.read_csv()

What is a csv file?

100

This type of plot is best for showing the distribution of a numerical variable.

What is a histogram? 

100

The correlation value(s) that indicates the strongest possible linear relationship.(BE CAREFUL)

+ and - 1 (MUST HAVE BOTH)

200

This type of statistic uses sample data to make conclusions about a population.

What is inferential statistics?

200

If your p-value is less than 0.05, this is usually the decision you make.

What is reject the null hypothesis?

200

This function removes rows that contain missing values.

df.dropna() or dropna() as df is the dataset

200

This plot displays median, quartiles, and outliers.

What is a boxplot?

200

The regression type used for predicting a binary outcome.

What is logistic regression?

300

In a study with no researcher intervention, this type of design is used.

What is an observational study?

300

This test compares the means of 3 or more groups.

What is ANOVA?

300

This function prints column names, data types, number of non-null entries, and memory usage.

what is .info()

300

This straight line plot checks whether data follow a normal distribution.

What is a QQ plot?

300

These values must be removed before correlation because correlation cannot compute with these.

What is missing or NaNs

400

The variable that is being predicted in a model. (NOT just a letter)

What is the response or dependent variable? 

400

This test is used to evaluate the association between two categorical variables.

What is Chi-square test?

400

This function gives summary statistics for numeric columns by default.

What is .describe()?

400

A scatterplot requires that both variables be of this data type.

What is continuous variables?

400

This metric measures the proportion of variation explained by a regression model.

What is the r-squared coefficient? 

500

This variable type can distort the relationship between the explanatory and response variables if not controlled. We can eliminate issues caused by this variable in a designed experiment with randomization.

What is a confounding variable?

500

This test is used to compare two proportions. 

What is a two-sample z-test?

500

This function provides frequency counts for each category in a categorical column.

What is .value_counts()?

500

This argument in sns.scatterplot() is used to color points by category.

What is hue?

500

This metric penalizes unnecessary predictors and is preferred over regular R² in multiple regression. 

What is Adjusted R²?