REG. ANALYSIS
INF. STATISTICS
PRED. ANALYTICS
PRESC. ANALYTICS
DATA NARRATIVE
100

This type of graph, which compares two quantitative variables, is used in regression analysis

Scatterplot

100

A subset of data from a larger dataset that is used for inferential statistics is known as this

Sample

100

The way humans interact with Generative AI is called this

Prompt

100

When telling a story with data, you need to have these things

Data, Narrative, and Visualizations

100

The most important findings of your analysis should be explained in this portion of the narrative

Climax

200

This value, denoted by the letter R, measures the strength of a linear relationship

Correlation Coefficient

200

This type of test allows us to see if our sample is significantly different from the population

Hypothesis Test

200

A finite sequence of well-defined instructions used to solve a computational problem is known as this

Algorithm

200

The "So-What" and thesis portions of a data story both comprise this element of the story

Main Point

200

The "So-What" statement and thesis of the narrative should be in this portion of the narrative

Initiating Event

300

Points on a scatterplot that are far away from the line of best fit are called these

Influential Points

300

This quantity is used to show statistical significance

P-value

300

Type of algorithm that is mainly looking at natural patterns of data as opposed to fitting a model is known as what?

Unsupervised Learning

300

This phenomenon is experienced when two things happen to be correlated with each other by chance despite being unrelated to each other

Spurious Correlation

300

We make final recommendations to stakeholders in this portion of the narrative

Resolution / Conclusion

400

Mike is an economist who collected data on the age and net worth of a group of people.  He calculated the correlation as 0.63 and concluded that a higher net worth is directly caused by increased age.  Is he correct?  Why or why not?

No, he is incorrect because correlation does not imply causation

400

We compare the p-value to this quantity when we determine statistical significance

Significance level

400

Rich is creating an AI model that aims to predict with a high degree of accuracy the time it takes to drive from his house to work on any given day.  What type of model should he use?

Neural Network

400

When we try to understand how the audience perceives our explanations, we are studying this

Psychology of Data

400

We perform descriptive analytics in this section of the narrative

Exposition

500

Stephanie, a Formula-1 superfan, collected data on how quickly each driver accelerated from their starting position.  She graphed the distance from start versus the time for each driver on a scatterplot and calculated the correlation to be 0.  What can she conclude?

Distance and time have no linear relationship

500

Eveline is a doctor who recently read about a study on a new cancer medication that reported a p-value of 0.23 when compared to the existing cancer drug.  If the significance level is 0.05, what should she conclude?

Fail to reject null.  Not enough evidence to say new cancer drug is better than the existing one

500

Kermit is deciding between creating either a random forest or linear regression model to predict next year's sales for his business.  If he wants to know the impact of specific factors on sales, which model should he use and why?

Linear Regression model because it optimizes for interpretability

500

Neil is writing an article in his hometown newspaper about the recent closure of a large factory. Since he is unhappy about the closure, he only cited sources that agreed with his viewpoint in the article.  What type of bias does this article have?

Confirmation Bias

500

One of the functions of this section, among others, is to build the scaffolding

Rising Action

M
e
n
u