Abbreviations
Definitions
Coding
Statistical Modelling
Random / Fun Facts
100

ANOVA

What is 'Analysis of Variance'?


100

An interdisciplinary field that uses data, analytic and scientific methods, algorithms, and modelling to solve real world problems

What is data science?


100

fizzbuzz <- function(num){

    if ((num %% 3 == 0) & (num %% 5 == 0)) {

      print("fizzbuzz")

    } else if (num %% 3 == 0){

      print("fizz")

    } else if (num %% 5 == 0){

      print("buzz")

    } else{

      print(num)

    }

}

fizzbuzz(87)


What is "fizz"?

100

The name of the distribution plot you see on your screen.


What is normal distribution?


100

Number one most used programming language by Data Scientists.

What is Python?

200

R&D

What is 'Research and Development'?


200

Outlier

What is a data point that differs significantly from the rest of the data?

200

Most widely used R library used to build interactive visualizations for exploratory data analysis (an alternative to base R graphics)

What is ggplot2? (Could accept answer tidyverse, since it includes ggplot2)

200

The process of achieving very high accuracy for the training dataset but poor accuracy in predicting a new dataset

What is "overfitting"?

200

This is the amount of time spent on data cleaning (think percentage).

A. 95% B. 80% C. 75% D. 50% E. 20%

What is B, 80%? 

300

SWOT

What is 'Strength Weaknesses Opportunity Threat'?


300

A test that always predicts no effect or no relationship between variables

What is "null hypothesis"?

300

Rate these 3 algorithms by time efficiency. Use best-case time complexity and order from worst to best.

  • Merge sort

  • Selection sort 

  • Insertion sort

What is:

1. Merge Sort O(n*log n) (best case)

2. Insertion Sort O(n)

3. Selection Sort O(n2) (in any case)



300

The algorithm that has taken place 

What is K-means clustering?

300

Harvard Business Review referred to Data Science as…

What is “the sexist job of the 21st century"?

400

NLP

What is ‘Natural language processing'?

400

A statistical method that fits a line through the data and tries to reduce Root-Mean-Squared Error (Residuals)

What is "Linear Regression"?

400

Which part of this query is executed first? i.e. SELECT * FROM customers WHERE first_name=’Maryam’ ORDER BY last_name desc

What is FROM clause?

400

The most typical train-test split (think percentages).

What is 80/20?

400

An ‘old’ programming language that has evolved into and replaced by R

What is S?

500

JSON

What is ‘JavaScript Object Notation’?

500

A machine learning task where the user must make predictions on their model as well as its accuracy?

What is "Supervised Learning"?

500

This command displays the state of the working directory (think version control)

What is “git status”?

500

This resampling algorithm or method involves dividing the observations into k groups or folds and estimating a model's ability to predict new observations.

What is a "Cross Validation Algorithm"?

500

This American statistician defined data science in 1962

Who is John W. Tukey?

M
e
n
u