Data Types
Data Structures
General Python
Statistics Concepts
Scientific Method
100

What is a data type?

Data types are a classification of data

100

What is a data structure?

Data Structures are a format for organizing and storing data

100

What is the most common library used in data science to structure data like a table with rows and columns?

Pandas

100

What is a null hypothesis?

The hypothesis asserting no statistical significance (dull)

100

What is the first step when approaching a question as a data scientist? (Hint: think about the Data Science Reasoning Framework)

Define the problem and question

200

Which data type is most commonly used to express characters and words?

String

200

What is the most common data structure used to order a series of elements?

List

200

What is a Python library? What are three examples?

A python library is a collection of pre-written code (functions, methods, etc.)

Examples: Pandas, Lumpy, Matplotlib, ...

200

How do we say that the alternative hypothesis is wrong?

We ‘fail to reject’ the null hypothesis

200

What is a proxy variable?

A proxy variable is a variable that is used as a substitute or stand-in for another variable

300

Which data type is used when for a number that has decimal points?

Float
300

What data type stores a series of key-value pairs?

Dictionary
300
Suppose we have a variable named myuni with a value of "UChicago".


How would I print "My favorite university is UChicago" using the myuni variable?

print(f'My favorite university is {myuni}')

300

What is a p-value?

A number describing how likely it is that your data/result would have occurred by random chance

300

What makes a strong hypothesis?

Testability

400

In Python, what are the two common representations of a false boolean value?

False or 0

400

Suppose we have a dictionary named md (short for my dictionary). How would you first return only the keys, then return only the values, and then return the key-value pairs from mydictionary?

md.keys()

md.values()

md.items()

400

Suppose I have the following code to make a bar graph using MatPlotLib:

plt.bar(x, y)


How would you set the title to "My Title", the x-axis label to "x axis", and the y-axis label to "y axis"?

plt.title("My Title")

plt.xlabel("x axis")

plt.ylabel("y axis")

plt.show()

400

Which of the following p-values would reject the null with 90% confidence?

1. 0.07

2. 0.03

3. 0.12

4. 9e-14

1, 2, 4

400

In an experiment, which is the name of the variable that we are trying to find? What are the name of the variables that explain that variable?

1. Response or dependent variable

2. Explanatory or independent variable

500
Is the set data type in Python considered mutable or immutable? What is an example of a mutable and immutable data type?

Lists are considered mutable.

Mutable Examples: list, dictionary, set

Immutable Examples: numeric types, string, boolean, tuple

500
Suppose we have list named mylist. How do you find only the third-to-last and second-to-last elements only?
mylist[-3:-1]
500

Suppose I hav a csv file with columns "Name", "Age" named "https://mycsv".

Using pandas, how would I load the data into a dataframe that is set equal to variable df, drop rows with empty values, set the column types (you tell me the correct types), and show the first 13 lines?

Assume pandas has NOT yet been imported.

Import pandas as pd

df = pd.read_csv("https://mycsv")

df = df.dropna()

df['name'].astype(str)

df['age'].astype(int)

df.head(13)

500

What statistical test would be best to determine whether two categorical variables are related in a contingency table (two-way) table?

Chi-Square

500

What does this graph indicate, and not indicate?

There is a correlation between ice cream and AC sales, likely becuase of the temperature. But we cannot say that ice cream sales cause more ACs to be sold.