What is a data type?
Data types are a classification of data
What is a data structure?
Data Structures are a format for organizing and storing data
What is the most common library used in data science to structure data like a table with rows and columns?
Pandas
What is a null hypothesis?
The hypothesis asserting no statistical significance (dull)
What is the first step when approaching a question as a data scientist? (Hint: think about the Data Science Reasoning Framework)
Define the problem and question
Which data type is most commonly used to express characters and words?
String
What is the most common data structure used to order a series of elements?
List
What is a Python library? What are three examples?
A python library is a collection of pre-written code (functions, methods, etc.)
Examples: Pandas, Lumpy, Matplotlib, ...
How do we say that the alternative hypothesis is wrong?
We ‘fail to reject’ the null hypothesis
What is a proxy variable?
A proxy variable is a variable that is used as a substitute or stand-in for another variable
Which data type is used when for a number that has decimal points?
What data type stores a series of key-value pairs?
How would I print "My favorite university is UChicago" using the myuni variable?
print(f'My favorite university is {myuni}')
What is a p-value?
A number describing how likely it is that your data/result would have occurred by random chance
What makes a strong hypothesis?
Testability
In Python, what are the two common representations of a false boolean value?
False or 0
Suppose we have a dictionary named md (short for my dictionary). How would you first return only the keys, then return only the values, and then return the key-value pairs from mydictionary?
md.keys()
md.values()
md.items()
Suppose I have the following code to make a bar graph using MatPlotLib:
plt.bar(x, y)
How would you set the title to "My Title", the x-axis label to "x axis", and the y-axis label to "y axis"?
plt.title("My Title")
plt.xlabel("x axis")
plt.ylabel("y axis")
plt.show()
Which of the following p-values would reject the null with 90% confidence?
1. 0.07
2. 0.03
3. 0.12
4. 9e-14
1, 2, 4
In an experiment, which is the name of the variable that we are trying to find? What are the name of the variables that explain that variable?
2. Explanatory or independent variable
Lists are considered mutable.
Mutable Examples: list, dictionary, set
Immutable Examples: numeric types, string, boolean, tuple
Suppose I hav a csv file with columns "Name", "Age" named "https://mycsv".
Using pandas, how would I load the data into a dataframe that is set equal to variable df, drop rows with empty values, set the column types (you tell me the correct types), and show the first 13 lines?
Assume pandas has NOT yet been imported.
df = pd.read_csv("https://mycsv")
df = df.dropna()
df['name'].astype(str)
df['age'].astype(int)
df.head(13)
What statistical test would be best to determine whether two categorical variables are related in a contingency table (two-way) table?
Chi-Square
What does this graph indicate, and not indicate?
There is a correlation between ice cream and AC sales, likely becuase of the temperature. But we cannot say that ice cream sales cause more ACs to be sold.