Name That Architecture
AI Headlines (2025-26)
Chronically Online
System Design After Dark
Is this a real research paper?
200

This type of neural network uses convolutional layers and was the OG king of image classification.

What is a CNN?

200

This model family competes directly with GPT-4 and shares its name with a zodiac sign. 

What is "Gemini"?

200

This two word developer phrase is said right before everything breaks.

What is "Ship It"?

200

This Netflix-created tool randomly kills production instances to test system resilience.

What is Chaos Monkey?

200
"Attention is All You Need"

REAL: https://arxiv.org/abs/1706.03762

Subject: A paper proposing a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.

400

This Meta model family is open-source, started at 7B parameters, and became the go-to at one point for fine tuning models.

What is LLaMA?

400

This lab shocked the industry in 2025 with an open source model rivaling GPT-4 at a fraction of the costs.

What is DeepSeek?
400

This is what every MLE says when the model works but nobody knows why.

What is "It's a black box"?

400

This Nvidia software layer sits between PyTorch & the GPU and is famously painful to install.

What is CUDA? (Compute Unified Device Architecture)

400

"Do Large Language Models Dream of Electric Sheep?"

Fake

The paper is actually: Do Robot Snakes Dream like Electric Sheep? This paper investigates the Effects of Architectural Inductive Biases on Hallucination  https://arxiv.org/abs/2410.17477

600

This technique adapts LLMs by freezing pre-trained weights and adding tiny, trainable adapter matrices to layers.

What is LoRA or Low-Rank Adaptation?

600

This French AI startup raised $600M+ and became Europe's leading foundation model company.

What is Mistral AI?

600

This fictional AI from a 1968 film said "I'm sorry Dave, I'm afraid I can't do that".

What is HAL 9000? 

Also acceptable: the movie - What is "A Space Odyssey (2001)"?

600

This high performance file format (by Hugging Face) is now one of the defaults for sharing model weights... way safer than pickle.

What is Safetensors?

600
"An Image is Worth 16x16 Words"

REAL 

https://arxiv.org/abs/2010.11929

Paper discussing Transformers for Image Recognition at Scale

800

This Open AI architecture generates images from text & was named after a surrealist artist.

What is DALL-E?

800

This AI music generation startup lets you create full songs from a text prompt and went viral on TikTok in early 2024.

What is Suno? (also acceptable: Udio?)

800

This viral AI generated song in 2023 mimcked Drake & The Weeknd so convincingly that Universal Music filed a takedown. What was the name of the song?

What is Heart on my Sleeve?

800

This technique batches multiple inference requests to maximize GPU Utilization.

What is Dynamic Batching?

800

"Hungry Hungry Hippos: Towards Language Modeling with State Space Models"

Real

https://arxiv.org/abs/2212.14052

This paper discusses progress on understanding the expressivity gap between SSMs and attention in language modeling, and on reducing the hardware barrier between SSMs and attention.

1000

This technique trains a small "student" model to mimic a larger "teacher" model's outputs.

What is Knowledge Distillation?

1000

In late 2024, Open AI quietly shelved this rumored project that was supposedly achieving breakthrough reasoning capabilities. Elements of it resurfaced in the o1 model.

What is Q*? (Q star)
1000

This viral moment happened when a chatbot told a reporter it loved him and wanted him to leave his wife. What is the name of the chatbot (or name of the reporter)?

What is:

Bing Chat or Microsoft Bing?

or

Who is Kevin Roose?

1000

This caching technique stores previously computed key-value pairs during autoregressive generation to avoid recomputation.

What is KV Cache?

1000

"One Joke to Rule them All? On the (Im)possibility of Generalizing Humor"

Real

https://arxiv.org/html/2508.19402v1

In this paper, it is explored whether competence on one or more specific humor tasks confers any ability to transfer to novel, unseen types; in other words, is this fragmentation inevitable?

M
e
n
u