What's the difference between regression and classification?
Regression predicts continuous values, while classification predicts discrete ones.
What Anthropic model released in February is particularly strong at programming?
Claude 3.7 Sonnet
Find the derivative (with respect to x) of y = cos(x) + sin(x).
y' = -sin(x) + cos(x)
What country does St. Patrick's Day originate from?
Ireland
What does gradient descent do?
Gradient descent is an optimization algorithm that iteratively finds a local minimum of a function - in particular, the cost function in machine learning.
Generative pre-trained transformer
Compute the gradient of the function with respect to x of f(x, y) = <x^2 + 2y, y^2, 24x>.
<2x, 0, 24>
Who is the chief justice of the Supreme Court?
John Roberts
In which paper was the Transformer introduced?
"Attention is All You Need" (Vaswani et al.)
What is regularization and why do it?
Regularization is a technique for decreasing overfitting in models, improving generalization on new datasets.
Let Q be a matrix. What is Q^T?
The "transpose of Q:" the matrix whose ij-entry is the ji-entry of Q.
What is a monad?
(half credit for identifying the subfield of CS it's commonly found in)
A monoid in the category of endofunctors.
Write the formula for scaled dot-product attention.
softmax(QK^T/sqrt(d))V
Q(s,a) is the expected (discounted) reward given a state and action at that state (and the optimal policy? if you want)
A policy is a way to choose an action at states
Which of the following subsets of R is/are compact under the standard topology?
- [0, 1]
- (0, 1)
- R
- the Cantor set
[0, 1]
Where are Evan and Albon from?
Chicago; Houston