Foundations of Data Science
Data Types
Working with Data
Statistics and Graphs
Data Ethics
100

The entire group you want information about.

What is a population? 

100

“72, 85, 90, 66” are examples of this.

What is raw data?

100

This is the delimiter in a csv. 

What is a comma?

100

Mean, median, and mode are measures of this.

What is central tendency?

100

One of the five principles of data ethics.

What is transparency, fairness, privacy, accountability, consent?

200

A smaller group taken from a population to study. 

What is a sample?
200

A statement like “The average test score is 78%” is an example of this.

What is information?

200

.shape tells us this about a dataset.

What are the number of rows and columns?

200

A graph shows sales rising dramatically, but the y-axis begins at 98 instead of 0. This is an example of this type of misleading technique.

What is axis manipulation (or misleading visualization)?

200

Data ethics applies to these stages of data work.

What are collection, storage, analysis, and data use?

300

The three stages of the data science life cycle.

What are data preparation, data analysis, and data storytelling?

300

Eye color is this type of variable.

What is categorical (nominal)?

300

.describe provides this type of summary. (looking for two specific things)

What is a statistical summary of numeric columns?

300

A graph used to show relationship between two numeric variables.

What is a scatterplot? 

300

A dataset was collected legally, but later used to target vulnerable individuals with manipulative advertising. This shows the difference between these two concepts.

What is ethics vs. law?

400

This term describes a type of thinking where you question whether a sample truly represents the population before accepting conclusions.

What is statistical thinking?

400

Number of siblings is numeric and falls into this category.

What is discrete?

400

This issue will arise if .info tells you that a variable "body_mass_g" is stored as an object.

What is it cannot be analyzed numerically because python does not have it defined as a numeric variable? 
400

When a distribution has two peaks.

What is bimodal? 

400

This principle of data ethics tell us that storage must be secure.

What is to protect privacy. 
500

A poll of 25 volunteers from a gym concludes that “90% of Americans exercise weekly.” This flaw weakens the conclusion.

What is sampling bias?

500

This issue is highlighted by a researcher calculating and interpreting the average of all numeric variables in the dataset including student ID numbers.

What is incorrectly classifying id as numeric? 

500

This type of plot is produced by the following code

plt.scatter(df_front["sleep"], df_front["screen_time"])

plt.show()

What is scatter plot? 

500
When the mean is greater than the median. 

What is a skewed right distribution? 

500

A health app collects detailed location and medical data but fails to explain how it will be used in the future. This violates at least two ethical principles. Name both.

What are transparency and informed consent?