What are the core skills of an ideal data scientist? (the "Unicorn")
Statistics, programing (computer science), and domain expertise
What data collection method should be used if a researcher wants to make a causal claim?
Experiment
What are the five C's of data ethics?
Consent, clarity, consistency, control & transparency, and consequences
What is structured data?
Data that is highly organized and easily searchable, typically stored in tables or databases.
What are the three measures of central tendency?
A shoe company wants to know the average shoe size of adults in the U.S. They randomly select 2,000 adults and record their shoe sizes.
Who is the population of interest?
All adults in the U.S.
What data collection method should be used to capture natural behavior?
Observational Study
Sharing how data is collected, analyzed, and used is an example of which principle of data ethics (hint this is from your presentations)
Transparency
What type of data includes emails, social media posts, and audio recordings, which lack a fixed format.
Unstructured data
What type of data visualization is appropriate for qualitative data?
Bar chart or pie chart
Which phase of the data science life cycle is the following an example of?
A political scientist uses statistical models to identify whether age and income predict voter turnout.
What is the name of the phenomenon that implies your analysis is only as good, or bad, as the data you collected?
Garbage in, Garbage out
The GDPR, Equality Act 2010, HIPAA, FERPA might need consideration to address this aspect of ethics in data science.
Legal compliance
What does CSV stand for?
Comma-Separated Values
If the mean is lower than the median what best describes the shape of the distribution?
Skewed left
What word describes the the end result of developing a mindset to think critically about data in business and everyday life?
Statistical Thinking
What were the five types of survey questions we talked about in class?
Multiple choice, rank-order, likert scale, open-ended questions, and dichotomous
During which stage of the data science process should data ethics be addressed?
EVERY STAGE!
What does TSV stand for?
Tab-Separated Values
What type of data is displayed in a histogram?
Continuous
Give an example of the following:
Nominal, Ordinal, Discrete, and Continuous
Will vary
What type of data is most cost effective?
Secondary data or data collected by others
What is the issue with having oaths instead of laws?
Oaths are symbolic and lack daily impact
What are ways to handle missing data?
Removing observations with missing values,
Imputation (internal or external data)
What measure of spread would be most appropriate to find for the following dataset?
0, 1, 1, 3, 15
IQR