This type of graph, which compares two quantitative variables, is used in regression analysis
Scatterplot
A subset of data from a larger dataset that is used for inferential statistics is known as this
Sample
The way humans interact with Generative AI is called this
Prompt
When telling a story with data, you need to have these things
Data, Narrative, and Visualizations
The most important findings of your analysis should be explained in this portion of the narrative
Climax
This value, denoted by the letter R, measures the strength of a linear relationship
Correlation Coefficient
This type of test allows us to see if our sample is significantly different from the population
Hypothesis Test
A finite sequence of well-defined instructions used to solve a computational problem is known as this
Algorithm
The "So-What" and thesis portions of a data story both comprise this element of the story
Main Point
The "So-What" statement and thesis of the narrative should be in this portion of the narrative
Initiating Event
Points on a scatterplot that are far away from the line of best fit are called these
Influential Points
This quantity is used to show statistical significance
P-value
Type of algorithm that is mainly looking at natural patterns of data as opposed to fitting a model is known as what?
Unsupervised Learning
This phenomenon is experienced when two things happen to be correlated with each other by chance despite being unrelated to each other
Spurious Correlation
We make final recommendations to stakeholders in this portion of the narrative
Resolution / Conclusion
Mike is an economist who collected data on the age and net worth of a group of people. He calculated the correlation as 0.63 and concluded that a higher net worth is directly caused by increased age. Is he correct? Why or why not?
No, he is incorrect because correlation does not imply causation
We compare the p-value to this quantity when we determine statistical significance
Significance level
Rich is creating an AI model that aims to predict with a high degree of accuracy the time it takes to drive from his house to work on any given day. What type of model should he use?
Neural Network
When we try to understand how the audience perceives our explanations, we are studying this
Psychology of Data
We perform descriptive analytics in this section of the narrative
Exposition
Stephanie, a Formula-1 superfan, collected data on how quickly each driver accelerated from their starting position. She graphed the distance from start versus the time for each driver on a scatterplot and calculated the correlation to be 0. What can she conclude?
Distance and time have no linear relationship
Eveline is a doctor who recently read about a study on a new cancer medication that reported a p-value of 0.23 when compared to the existing cancer drug. If the significance level is 0.05, what should she conclude?
Fail to reject null. Not enough evidence to say new cancer drug is better than the existing one
Kermit is deciding between creating either a random forest or linear regression model to predict next year's sales for his business. If he wants to know the impact of specific factors on sales, which model should he use and why?
Linear Regression model because it optimizes for interpretability
Neil is writing an article in his hometown newspaper about the recent closure of a large factory. Since he is unhappy about the closure, he only cited sources that agreed with his viewpoint in the article. What type of bias does this article have?
Confirmation Bias
One of the functions of this section, among others, is to build the scaffolding
Rising Action