Intro to Data Science
Data Collection
Data Ethics
Data Cleaning
Statistics
100

What are the core skills of an ideal data scientist? (the "Unicorn")

Statistics, programing (computer science), and domain expertise

100

What data collection method should be used if a researcher wants to make a causal claim? 

Experiment

100

What are the five C's of data ethics?

Consent, clarity, consistency, control & transparency, and consequences

100

What is structured data?

Data that is highly organized and easily searchable, typically stored in tables or databases.

100

What are the three measures of central tendency?

Mean, median, and mode
200

A shoe company wants to know the average shoe size of adults in the U.S. They randomly select 2,000 adults and record their shoe sizes.

Who is the population of interest? 

All adults in the U.S.

200

What data collection method should be used to capture natural behavior?

Observational Study

200

Sharing how data is collected, analyzed, and used is an example of which principle of data ethics (hint this is from your presentations)

Transparency

200

What type of data includes emails, social media posts, and audio recordings, which lack a fixed format.

Unstructured data

200

What type of data visualization is appropriate for qualitative data? 

Bar chart or pie chart

300

Which phase of the data science life cycle is the following an example of?

A political scientist uses statistical models to identify whether age and income predict voter turnout.

Data Analysis
300

What is the name of the phenomenon that implies your analysis is only as good, or bad, as the data you collected?

Garbage in, Garbage out

300

The GDPR, Equality Act 2010, HIPAA, FERPA might need consideration to address this aspect of ethics in data science.

Legal compliance

300

What does CSV stand for?

Comma-Separated Values

300

If the mean is lower than the median what best describes the shape of the distribution?

Skewed left

400

What word describes the the end result of developing a mindset to think critically about data in business and everyday life?

Statistical Thinking

400

What were the five types of survey questions we talked about in class?

Multiple choice, rank-order, likert scale, open-ended questions, and dichotomous 

400

During which stage of the data science process should data ethics be addressed?

EVERY STAGE!

400

What does TSV stand for?

Tab-Separated Values

400

What type of data is displayed in a histogram? 

Continuous

500

Give an example of the following:

Nominal, Ordinal, Discrete, and Continuous

Will vary

500

What type of data is most cost effective? 

Secondary data or data collected by others

500

What is the issue with having oaths instead of laws? 

Oaths are symbolic and lack daily impact

500

What are ways to handle missing data?

Removing observations with missing values, 

Imputation (internal or external data)

500

What measure of spread would be most appropriate to find for the following dataset? 

 0, 1, 1, 3, 15

IQR