Be the Data
This is the process of gathering raw data from various sources.
What is data collection?
This branch of statistics utilizes measures of center, measures of spread, and some data visualizations to summarize a dataset.
What is descriptive statistics?
This sampling method gives everyone an equal chance of being selected.
What is random sampling?
This graph is utilized to show counts for each category on the y-axis and categories on the x-axis.
What is a bar chart?
This symbol should never be in an alternative hypothesis.
What is =?
What is the margin of error?
This assumption of correlation and regression can be checked using a qq-plot.
What is normality?
This term describes masking personal data by removing exact location, but still providing a general idea of location for example.
What is de-identification?
This measure of center is typically not found when using the .describe() function.
What is the mode?
This sampling method requires the selection of every kth person for participation.
What is systematic sampling?
This visualization is most helpful when identifying outliers.
What is a boxplot?
This is the null and alternative hypothesis for a test of whether two categorical variables are related.
What is
Null: No association
Alternative: Association
This is the symbol representing the point estimate when you are calculating a 95% confidence interval to estimate the true proportion of Clemson students that play Fortnite.
What is
hat{p}
When checking your residual plot, you hope to see no pattern, if you do, this might be violated.
What is homoscedasticity?
This term describes the type of study from which we can make causal claims.
What is an experiment?
This branch of statistics describes taking information about a sample and extending it to the population.
What is inferential statistics?
This sampling method utilizes individuals that are easily accessible for the researcher.
What is convenience sampling?
This visualization is used to show the distribution of numerical variables and is often confused with its qualitative counterpart.
What is a histogram?
When compared to each other, these two values help us make a decision about whether or not to reject the null hypothesis.
What is level of significance and p-value?
This describes what it means when we say, "We are 95% confident..."
What is, if we take 100 samples of the same size and build 100 confidence intervals from these unique samples, about 95% of them will capture the population parameter, roughly 5% won't?
This graphic can help us visualize if two quantitative variables are linearly related.
What is a scatter plot?
This term describes data that was collected directly by the research using the data.
What is primary data?
This term refers to the true population value that we are interested in.
What is parameter?
This sampling method utilizes natural groupings and then selects whole groups for the sample.
What is cluster sampling?
These two graphs are not as commonly used today as they cannot handle large amounts of data.
What are dot plots and stem-and-leaf plots?
This test should be utilized when trying to determine if the true average score on overcooked for Clemson students is higher than the true average score on overcooked for USC students.
What is a two-sample t-test?
This value is always the contained with in the confidence interval and in our class it is always the center.
What is point estimate.
This graphic allows us to see the strength of a linear relationship as a numeric value that is color coded.
What is a heat map?
When discussing these terms we learned that how the data is organized, or unorganized, can affect the processing of the data.
What is structured vs unstructured data?
This term says that as you increase the number of trials, the observed probability will approach the true probability.
What is the law of large numbers?
This sampling method divides the population into groups before drawing a sample from each group.
What is stratified sampling?
This graph is used to view the relationship between two quantitative variables.
What is a scatter plot?
This describes what a p-value actually means.
What is, the probability of finding a sample as extreme, or more extreme, than what we observed, given the null hypothesis is true.
This is the generic interpretation of a 95% confidence interval.
What is "We are 95% confident that the true parameter is between the lower bound and upper bound.
This is the correct model for the following output:
What is
hat{y}=-5872.09+50.15x