MHML 3/13

Review

General ML

Math

Misc.

100

What's the difference between regression and classification?

Regression predicts continuous values, while classification predicts discrete ones.

100

What Anthropic model released in February is particularly strong at programming?

Claude 3.7 Sonnet

100

Find the derivative (with respect to x) of y = cos(x) + sin(x).

y' = -sin(x) + cos(x)

100

What country does St. Patrick's Day originate from?

Ireland

200

What does gradient descent do?

Gradient descent is an optimization algorithm that iteratively finds a local minimum of a function - in particular, the cost function in machine learning.

200

What does GPT stand for?

Generative pre-trained transformer

200

Compute the gradient of the function with respect to x of f(x, y) = <x^2 + 2y, y^2, 24x>.

<2x, 0, 24>

200

Who is the chief justice of the Supreme Court?

John Roberts

300

In which paper was the Transformer introduced?

"Attention is All You Need" (Vaswani et al.)

300

What is regularization and why do it?

Regularization is a technique for decreasing overfitting in models, improving generalization on new datasets.

300

Let Q be a matrix. What is Q^T?

The "transpose of Q:" the matrix whose ij-entry is the ji-entry of Q.

300

What is a monad?

(half credit for identifying the subfield of CS it's commonly found in)

A monoid in the category of endofunctors.

400

Write the formula for scaled dot-product attention.

softmax(QK^T/sqrt(d))V

400

Explain what the q-function and value function are, and what a policy is.

V(s) is the expected (discounted) reward given a certain state (and the optimal policy? if you want)

Q(s,a) is the expected (discounted) reward given a state and action at that state (and the optimal policy? if you want)

A policy is a way to choose an action at states

400

Which of the following subsets of R is/are compact under the standard topology?

- [0, 1]

- (0, 1)

- R

- the Cantor set

[0, 1]

400

Where are Evan and Albon from?

Chicago; Houston