The entire group you want information about.
What is a population?
“72, 85, 90, 66” are examples of this.
What is raw data?
This is the delimiter in a csv.
What is a comma?
Mean, median, and mode are measures of this.
What is central tendency?
One of the five principles of data ethics.
What is transparency, fairness, privacy, accountability, consent?
A smaller group taken from a population to study.
A statement like “The average test score is 78%” is an example of this.
What is information?
.shape tells us this about a dataset.
What are the number of rows and columns?
A graph shows sales rising dramatically, but the y-axis begins at 98 instead of 0. This is an example of this type of misleading technique.
What is axis manipulation (or misleading visualization)?
Data ethics applies to these stages of data work.
What are collection, storage, analysis, and data use?
The three stages of the data science life cycle.
What are data preparation, data analysis, and data storytelling?
Eye color is this type of variable.
What is categorical (nominal)?
.describe provides this type of summary. (looking for two specific things)
What is a statistical summary of numeric columns?
A graph used to show relationship between two numeric variables.
What is a scatterplot?
A dataset was collected legally, but later used to target vulnerable individuals with manipulative advertising. This shows the difference between these two concepts.
What is ethics vs. law?
This term describes a type of thinking where you question whether a sample truly represents the population before accepting conclusions.
What is statistical thinking?
Number of siblings is numeric and falls into this category.
What is discrete?
This issue will arise if .info tells you that a variable "body_mass_g" is stored as an object.
When a distribution has two peaks.
What is bimodal?
This principle of data ethics tell us that storage must be secure.
A poll of 25 volunteers from a gym concludes that “90% of Americans exercise weekly.” This flaw weakens the conclusion.
What is sampling bias?
This issue is highlighted by a researcher calculating and interpreting the average of all numeric variables in the dataset including student ID numbers.
What is incorrectly classifying id as numeric?
This type of plot is produced by the following code
plt.scatter(df_front["sleep"], df_front["screen_time"])
plt.show()
What is scatter plot?
What is a skewed right distribution?
A health app collects detailed location and medical data but fails to explain how it will be used in the future. This violates at least two ethical principles. Name both.
What are transparency and informed consent?