Speaking Like a Pirate
Analytics Toolbox
CASE: Used Cars
CASE: Salary
CASE: Credit Scoring
Class of 85 AfterParty
100

This is best function in R to quickly find the mean of all of your continuous variables.

What is summary(df)?

100

In the Titanic data, pclass was this kind of data.

What is ordinal?

100

This was the best way to demonstrate a relationship between Price and Miles.

What was a scatterplot?

100

From the salary dataset, this was the best graphic to represent the relationship between education level and salary.

What is a series bar chart?

100

Using the Credit data, this is the best graphic to represent the relationship between UTIL6 (utilization average over the last 6 months) and RBAL (revolving balance).

What is a scatterplot?

100

Name the artist or the song...

200

This is the best function in R to find the mode for a single categorical variable.

what is table(df$variable)?

200

In the Titanic data, this is the best graphical option to demonstrate the relationship between survived (yes/no) and gender (male/female).

What is a 100% Stacked Bar Chart?

200

Based on this output from Excel, this is the model to predict car price:

What is 

Price = $23,727 - $209(Age) - $0.075(Miles)

200

From the salary data, this is the best way to represent the relationship between Gender and Education.

What is a 100% stacked bar chart?

200

Using R, report the model that was generated to predict credit default using UTIL6: 

What is:

P(default) = 

e^(-3.36 + 3.67*UTIL6)/

(1 + e^(-3.36 + 3.67*UTIL6)) 

200

Name the artist or the song.

300

This is the one line of code needed to run a simple linear model.

What is

model<-lm(dependent~independent, data = df)

300

In the USED CAR dataset, "Fuel Type" (e.g., gas, electric, diesel) is a categorical variable.  This is the best graphical option to display this data.

What is a bar chart?

300

Based on this output, this is the price of a new car:

What is $23,727.49?

300

From the salary case analysis, this variable was found to NOT be a significant predictor of salary.

What is gender?

300

Using this output, what is the probability of default for someone with a UTIL6 of .6?


What is 23.93%?
300

Name the Artist or the Song.

400

This is the function that you used to find the mean values of all of the continuous variables against the binary dependent (outcome) variable in the credit dataset. 

what is tapply?

tapply(util6,outcome,mean)

400

In the USED CAR dataset, both "Price" and "Miles" were continuous variables.  This is the best graphical option to demonstrate their relationship.

What is a scatterplot?

400

Based on this snapshot below, this is the code needed to recode transmission to be used in the model.  Be specific.

In cell J2: 

=IF(G2 = "automatic" , 1,0)

400

Based on the output below, write out the model:

What is:

SALARY = -$49,320.40 + $2579.50 (AGE) + $3160.9 (EXP) + $14,088.1 (EDUCATION)

400

Consider the matrix below.  This is the accuracy rate of the credit scoring model:

What is 74%? (523 + 221)/1000

400

Name the Artist or the Song

500
To view the results of your model, you would run this line of code (assume you titled the output of your model "model").

what is summary(model)?

500

In the CREDITFILE - the variable UTIL6 represented the utilization rate over the last 6 months and was continuous.  You created a variable "outcome" that was binary.  This is the best graphical option to represent their relationship.

What is a series bar chart?

500

Based on the snapshot of the data below, this is the code that you would use to create a new variable called "TRUCK" that would classify a pickup or a truck as a 1 and all others as a 0. Be specific.

What is:

=IF(H2="pickup",1,IF(H2 = "truck",1,0))


500
This was the accuracy of the Salary Model - within 5%.

What was 89%?

500

Consider the output below.  If the cutoff for default is .2, what is the reported accuracy of the model?

What is 90%?


500

Name the Artist or the Song

M
e
n
u