When graph is used to display the frequency of numeric values by dividing data into bins along the x axis?
Histogram
What is the purpose of data science, and some examples in our everyday lives?
Turn raw information into fascinating insights that guide decisions and predict outcomes. E.g. Tesla and Meta: trains algorithms that shape real-world behavior
How would you print “I love data science so much” in R?
print(“I love data science so much”)
What type of data represents categories with no inherent order, like colors or types of fruit?
Nominal/Categorical Data
When data values range across several orders of magnitude, what scale helps make small and large values easier to compare on the same axis?
Logarithmic Scale
What is algorithmic bias, and some examples include?
A paradox that describes how algorithms trained on human data can unintentionally learn human prejudice. On ChatGPT, users’ everyday responses edit its training data little by little, no matter how strong it’s built.
What decade was R first released (extra point for correct year)?
1990s
What type of data involves categories that can be ranked, like “poor,” “average,” and “excellent.”?
Ordinal Data
Which graph compares the distribution of a numerical feature across multiple groups and includes medians, quartiles, and potential outliers?
Box-and-Whisker Plot
This company/platform famously said, “We’re not a streaming service; we’re a data company that happens to stream movies.” Bonus Points: Explain.
Netflix; uses data science to predict what the user will watch next by analyzing what kinds of movies the user watches, as well as its age ratings to carefully organize what kind of content the user will be most tempted to check out
In what country was Python first created?
Netherlands
What type of data can take on any numerical value within a range, like height or temperature?
Continuous Data
Which graph uses color intensity to represent a third variable on a grid defined by two other variables?
Contour Plot or Heatmap
What is clustering?
An unsupervised machine learning technique that groups similar and unlabeled data points together based on characteristics
Who is credited with creating the term/job title “data scientist”? (Hint: Harker parent!)
DJ Patil
What data type has numbers that represent counts and can’t include fractions, such as the number of cars in a parking lot?
Discrete Data
A box plot shows a long whisker extending upward from the box where upwards represents increasing values. Is the data right skewed or left skewed?
Right Skewed
What is the Law of Large Numbers?
Why large random sample sets start to look like the population they came from (dataset increase -> sample average gets closer to true expected value of entire population)
E.g. flipping a coin 1,000 times -> 50/50 heads and tails, even though first few results may seem random
How many bytes of data exist for every grain of sand on earth?
400,000
What type of variable can take on both numerical and categorical properties? (For example, ZIP codes are numbers but represent locations rather than quantities)
Categorical Variable with Numerical Labels