DATA SCIENCE! Jeopardy Template

PS 1.1. Collect Data

PS 1.2 Interrogate Data

PS 1.3 Interpret and Communicate Results

PS 2.1 Use a Data Structure to Represent Data

PS 2.2 Analyze and Visualize Data

100

The topic of this data.

What is school success rates?

100

There are this many data points in the data set.

What is 1000?

100

The typical value of grad %>% ggplot(aes(x=act)) + geom_histogram(bins=16) tells us this about this group of students.

The students are above the national average (of around 19 points or so).

100

This is how to look at the data.

What is > grad ?

100

> grad %>% ggplot(aes(x=p_income)) + geom_histogram()

Show me a histogram!

200

This is a categorical variable.

What is parental level of education?

200

This is the mean SAT score.

What is 1999.9 (or 2000)?

200

The data that conveys a diverse population.

What is the parents education?

200

This is the collection of students that would be on honor roll in high school.

What is filtering on GPA?

200

> grad %>% ggplot(aes(x=yrs_to_grad, y=un_gpa,group=yrs_to_grad)) + geom_boxplot()

SHOW ME A BOXPLOT!

300

The question that this data could answer.

What is

How effective are entrance requirements at predicting gpa and graduation rate?
What would be the best way to improve graduation rates?
...

300

The distribution and description of the variable is this.

What is [the shape, typical value and spread] of this data?

300

The conclusion from this graph is this.

grad %>% ggplot(aes(x=hs_gpa, y=un_gpa,group=hs_gpa)) + geom_boxplot()

What is the conclusion?

300

The two (or more) data sets separated by a variable

What is two variables with filtered data?

300

The graph that shows a correlation between two variables.

A two variable graph!

400

The supporting/contradicting evidence for this data (find evidence and discuss!).

What did you find?

400

The anomalies (which anomalies) in this data could indicate this.

What is the outliers?

400

The students in this group are this.

What do we learn?

400

This is the way to present percentages of data

What is a frequency table?

400

This multiple variable graph show this.

What is a multi-variable graph?

500

Other data sources that can correspond, there are ups and downs of this other data set!

Show me the other data sources!

500

The overarching distributions of the data (with evidence)

Describe the data broadly!

500

The impacts of 'this' variable on 'that' variable.

What is the correlation?

500

The simulation of this variable would demonstrate that

How might one create a simulation of percentages?

500

The spread of this data informs us of this.

What does it say?