APIS / Scraping
SQL
¯\_(ツ)_/¯
Ensembling
Formulas
100

What is the status code for OK

200

100

Which SQL statement is used to return only different values?

SELECT DISTINCT

100

What provides the structuring, styling, and interactivity for a website?

HTML, CSS, Javascript

100

Which of the following algorithm are not an example of ensemble learning algorithm?

A) Random Forest
B) Adaboost
C) Extra Trees
D) Gradient Boosting
E) Decision Trees

E

100

What is the formula for Precision?

True Positives / (True Positives + False Positives)

200

What is the DOM?

The DOM, or Document Object Model -- as the name suggests -- is a model of the HTML document. This model allows us to interface with the document so that we can manipulate it as needed.

200

With SQL, how do you select all the records from a table named "Persons" where the "FirstName" is "Peter" and the "LastName" is "Jackson"?

SELECT * FROM Persons WHERE FirstName='Peter' AND LastName='Jackson'

200

Which of these can be run as parallel processes?

A) Boosted Models

B) Bagging Models

C) Single Level Stacked Models

D) API Calls (not contingent on retrieving previous page data)

E) KNN


B, C, D

200

Which hyperparameter: max_features or max_samples will likely adjust bagging classifier score most?

max_features

200

What is the formula for Recall?

True Positives / (True Positives + True Negatives)

300

Name 3 HTTP clients

  • Browsers - Chrome, Firefox and Safari.
  • Command Line programs - curl and wget.
  • Application code - Python Requsts, Scrapy, Mechanize
300

With SQL, how can you return the number of records in the "Persons" table?

SELECT COUNT(*) FROM Persons

300

What is the name of a single-level decision tree?

And in which ensemble method you will see it often?

Decision Stump, in Boosting.

300

Which is likely to have more variance: Boosted or Bagged models?


Boosted should have a higher variance.

300

What is the formula for Information Gain?

parent node gini or entropy - average child node gini or entropy

400

What does a status code of 500 mean?

Internal Server Error


400

With SQL, how do you select all the records from a table named "Persons" where the value of the column "FirstName" starts with an "a"?

SELECT * FROM Persons WHERE FirstName LIKE 'a%'

400

Verbally describe the difference in the formula between Lasso and Ridge regularizations

In the penalty term: Lasso takes the sums of the absolute value of the coefficients, Ridge takes the squares.

400

What models does stacking use?

Stacking combines any models of the same type. I.e. any classification models or any regressions models

400

What is the formula to calculate R-square?

1 - (Residual Sum of Squares/ Total Sum of Squares)

500

When must you use selenium instead of an API?

When the content you want to crawl is being added to the page via JavaScript, rather than baked into the HTML.

500

With SQL, how can you return all the records from a table named "Persons" sorted descending by "FirstName"?

SELECT * FROM Persons ORDER BY FirstName DESC

500

The models in Adaboost vote with weights, what is the key parameter in that weight?

misclassification error

500

What is the main difference between bagging and boosting models?

Each iteration of the boosted model relies on the error of the previously fitted model to adjust the weights of the bootstrapped samples for that iterations fit, thus learning from previous errors. 

Bagging runs n_estimators number of fitted models of the same type with different bootstrapped samples and features and then votes given all their y_preds.

500

What is the t statistic formula for a two-sample t-test?