In this parameter‑efficient fine‑tuning method, a frozen weight matrix: W0∈Rd×kW_0\in\mathbb{R}^{d\times k}W0∈Rd×k is adapted by learning a low‑rank update: ΔW≈BA\Delta W \approx BAΔW≈BA with B∈Rd×rB\in\mathbb{R}^{d\times r}B∈Rd×r, A∈Rr×kA\in\mathbb{R}^{r\times k}A∈Rr×k, often applied as: W′=W0+(α/r) BAW' = W_0 + (\alpha/r)\,BAW′=W0+(α/r)BA.
What is LoRA (Low‑Rank Adaptation)?
In The Mandalorian, what was Grogu called for most of Season 1 before his real name was revealed?
What is “The Child”?
This number is approximately 2.718 and shows up in continuous growth models, compound interest, and machine learning loss functions.
What is Euler’s number "e"?
On May 10, 1869, a “Golden Spike” ceremony marked the completion of what major project that connected the U.S. from coast to coast?
What is the Transcontinental Railroad?
This tiny piece at the end of your shoelaces has a real name.
What is an aglet?
This decoding strategy samples the next token from the smallest set of candidates whose cumulative probability mass meets a threshold ppp (i.e., the “nucleus”), then renormalizes probabilities over that set before sampling.
What is top‑ppp (nucleus) sampling?
In The Mandalorian, what is the name of Mando’s original ship that gets destroyed?
What is the Razor Crest?
This probability rule updates what you believe after seeing new evidence (used in spam filters and ML models).
What is Bayes’ Theorem?
On May 5, 1961, this astronaut became the first American in space.
Who is Alan Shepard?
This animal can sleep for up to 90% of its day.
What is a koala?
This diffusion inference trick replaces a separate classifier by combining unconditional and conditional predictions, e.g.
ϵ^θ(xt,c)=ϵθ(xt,∅)+w(ϵθ(xt,c)−ϵθ(xt,∅)),\hat\epsilon_\theta(x_t,c)=\epsilon_\theta(x_t,\varnothing)+w\big(\epsilon_\theta(x_t,c)-\epsilon_\theta(x_t,\varnothing)\big),ϵ^θ(xt,c)=ϵθ(xt,∅)+w(ϵθ(xt,c)−ϵθ(xt,∅)),
where w>1w>1w>1 increases prompt adherence at the cost of diversity.
What is classifier‑free guidance (CFG)?
In Rogue One, what is the tropical planet where the final battle and data vault are located?
What is Scarif?
When you take bigger and bigger samples, this theorem says their averages start to look like a normal (bell curve) distribution.
What is the Central Limit Theorem?
On May 25, 1961, this U.S. president announced the goal of landing a man on the Moon before the end of the decade.
Who is John F. Kennedy?
This sea creature has three hearts.
What is an octopus?
This “draft‑then‑verify” acceleration method has a fast model propose multiple tokens and a slower target model verify them (accepting a longest valid prefix and resampling at the first rejection), yielding speedups while preserving the target model’s distribution.
What is speculative decoding?
In Solo, what is the name of the masked leader of the Cloud-Riders who turns out not to be who you expect?
Who is Enfys Nest?
For any square matrix, this simple number (the sum of diagonal elements) equals the sum of its eigenvalues.
What is the trace?
On May 1, 1931, this famous New York City skyscraper was officially opened.
What is the Empire State Building?
This part of your body contains the smallest bone, called the stapes.
What is the ear?
This positional encoding rotates query/key vectors by complex phases eimωie^{i m\omega_i}eimωi and einωie^{i n\omega_i}einωi so attention scores depend on relative offsets via terms like ei(m−n)ωie^{i(m-n)\omega_i}ei(m−n)ωi, not absolute positions.
What is RoPE (Rotary Positional Embeddings)?
In Rogue One, what is the name of Cassian Andor’s reprogrammed Imperial droid companion?
Who is K-2SO?
This famous constant appears in everything from circles to probability distributions like the normal distribution.
What is pi?
On May 8, 1886, a pharmacist in Atlanta first sold what famous soft drink?
What is Coca-Cola?
Hot water can actually freeze faster than cold water—this unusual effect is known by this name.
What is the Mpemba effect?