ANOVA
What is 'Analysis of Variance'?
An interdisciplinary field that uses data, analytic and scientific methods, algorithms, and modelling to solve real world problems
What is data science?
fizzbuzz <- function(num){
if ((num %% 3 == 0) & (num %% 5 == 0)) {
print("fizzbuzz")
} else if (num %% 3 == 0){
print("fizz")
} else if (num %% 5 == 0){
print("buzz")
} else{
print(num)
}
}
fizzbuzz(87)
What is "fizz"?
The name of the distribution plot you see on your screen.
What is normal distribution?
Number one most used programming language by Data Scientists.
What is Python?
R&D
What is 'Research and Development'?
Outlier
What is a data point that differs significantly from the rest of the data?
Most widely used R library used to build interactive visualizations for exploratory data analysis (an alternative to base R graphics)
What is ggplot2? (Could accept answer tidyverse, since it includes ggplot2)
The process of achieving very high accuracy for the training dataset but poor accuracy in predicting a new dataset
What is "overfitting"?
This is the amount of time spent on data cleaning (think percentage).
A. 95% B. 80% C. 75% D. 50% E. 20%
What is B, 80%?
SWOT
What is 'Strength Weaknesses Opportunity Threat'?
A test that always predicts no effect or no relationship between variables
What is "null hypothesis"?
Rate these 3 algorithms by time efficiency. Use best-case time complexity and order from worst to best.
Merge sort
Selection sort
Insertion sort
What is:
1. Merge Sort O(n*log n) (best case)
2. Insertion Sort O(n)
3. Selection Sort O(n2) (in any case)
The algorithm that has taken place
What is K-means clustering?
Harvard Business Review referred to Data Science as…
What is “the sexist job of the 21st century"?
NLP
What is ‘Natural language processing'?
A statistical method that fits a line through the data and tries to reduce Root-Mean-Squared Error (Residuals)
What is "Linear Regression"?
Which part of this query is executed first? i.e. SELECT * FROM customers WHERE first_name=’Maryam’ ORDER BY last_name desc
What is FROM clause?
The most typical train-test split (think percentages).
What is 80/20?
An ‘old’ programming language that has evolved into and replaced by R
What is S?
JSON
What is ‘JavaScript Object Notation’?
A machine learning task where the user must make predictions on their model as well as its accuracy?
What is "Supervised Learning"?
This command displays the state of the working directory (think version control)
What is “git status”?
This resampling algorithm or method involves dividing the observations into k groups or folds and estimating a model's ability to predict new observations.
What is a "Cross Validation Algorithm"?
This American statistician defined data science in 1962
Who is John W. Tukey?