Graphs
Data Science Facts
Miscellaneous
Types of Data
100

When graph is used to display the frequency of numeric values by dividing data into bins along the x axis?

Histogram

100

What is the purpose of data science, and some examples in our everyday lives?

Turn raw information into fascinating insights that guide decisions and predict outcomes. E.g. Tesla and Meta: trains algorithms that shape real-world behavior

100

How would you print “I love data science so much” in R?

print(“I love data science so much”)

100

What type of data represents categories with no inherent order, like colors or types of fruit?

Nominal/Categorical Data

200

When data values range across several orders of magnitude, what scale helps make small and large values easier to compare on the same axis?

Logarithmic Scale

200

What is algorithmic bias, and some examples include?

A paradox that describes how algorithms trained on human data can unintentionally learn human prejudice. On ChatGPT, users’ everyday responses edit its training data little by little, no matter how strong it’s built.

200

What decade was R first released (extra point for correct year)?

1990s

200

What type of data involves categories that can be ranked, like “poor,” “average,” and “excellent.”?

Ordinal Data

300

Which graph compares the distribution of a numerical feature across multiple groups and includes medians, quartiles, and potential outliers?

Box-and-Whisker Plot

300

This company/platform famously said, “We’re not a streaming service; we’re a data company that happens to stream movies.” Bonus Points: Explain.

Netflix; uses data science to predict what the user will watch next by analyzing what kinds of movies the user watches, as well as its age ratings to carefully organize what kind of content the user will be most tempted to check out

300

In what country was Python first created?

Netherlands

300

What type of data can take on any numerical value within a range, like height or temperature?

Continuous Data

400

Which graph uses color intensity to represent a third variable on a grid defined by two other variables?

Contour Plot or Heatmap

400

 What is clustering?

An unsupervised machine learning technique that groups similar and unlabeled data points together based on characteristics

400

Who is credited with creating the term/job title “data scientist”? (Hint: Harker parent!)

DJ Patil

400

What data type has numbers that represent counts and can’t include fractions, such as the number of cars in a parking lot?

Discrete Data

500

A box plot shows a long whisker extending upward from the box where upwards represents increasing values. Is the data right skewed or left skewed?

Right Skewed

500

What is the Law of Large Numbers?

Why large random sample sets start to look like the population they came from (dataset increase -> sample average gets closer to true expected value of entire population)

  • E.g. flipping a coin 1,000 times -> 50/50 heads and tails, even though first few results may seem random

500

How many bytes of data exist for every grain of sand on earth?

400,000

500

What type of variable can take on both numerical and categorical properties? (For example, ZIP codes are numbers but represent locations rather than quantities)

Categorical Variable with Numerical Labels