Alignment Agendas
AI Safety Papers
Philosophy and EA
AI History
np.random.rand()
100
Stuart Russell's approach to alignment (described in Human Compatible)

What is inverse reinforcement learning?

100

2016 paper introduced the concept of "corrigibility" in AI systems

What is "Corrigible Artificial Intelligence" by Soares, Fallenstein, Yudkowsky, and Armstrong.

100

This Anthropic researcher has the first role of its kind.

What is "Model Welfare"?

100

A 1997 chess match between an AI and a world chess champion

What is Deep Blue vs Garry Kasparov?

100

a fictional world and civilization invented by Eliezer Yudkowsky

Dath Ilan

200

The full expression for G.O.W.A.W.

What is Go Out With A Whimper?

200

Which 2015 paper first introduced the concept of "concrete problems in AI safety"?

What is "Concrete Problems in AI Safety" by Dario Amodei, Chris Olah, et al.?

200

AI research trying to help individuals, humans and machines, to find ways to improve their joint welfare.

What is Cooperative AI?

200

A 2016 Go match between an AI and a top Go player

What is AlphaGo vs Lee Sedol?

200

"...most people may not realize how much of this entire field is myself wearing various ______." -- Eliezer, List of Lethalities comment

What are trenchcoats?

300

This Governor vetoed the first major US legislation on AI safety in 2024.

Who is Gavin Newsom?

300

This 2019 paper first introduced the concept of "mesa-optimization" in AI systems.

What is "Risks from Learned Optimization in Advanced Machine Learning Systems" by Hubinger, van Merwijk, Mikulik, Skalse, and Garrabrant

300

This technique is applied to shrimp eyestalks (and neural nets)

What is ablation?

300

The CNN that won the 2012 ImageNet Challenge

What is AlexNet?

300
The number of neurons in a human brain (OOM)

What is one hundred billion (10^11)

400

This country was the first to publish a national AI strategy in 2017.

What is Canada?

400

In 2016 this organization published the first technical report on "Scalable Oversight" in AI systems.

OpenAI, in their paper "Learning from Human Preferences"

400

A thought experiment demonstrating that an updateless decision theory using logical counterfactuals is better than an evidential updateless decision theory

What is the Troll Bridge?

400

The three labs that signed the White House voluntary commitments but are not members of the Frontier Model Forum

What are Amazon, Inflection, and Meta?

400

The Lovecraftian race that created the Shoggoth

What are The Elder Things?

500

The two documents named as inspiration for Claude's constitution

What are the UN Declaration of Human Rights and Apple's Terms of Service?

500

What 2020 paper introduced the concept of "constitutional AI" and its potential role in alignment?

What is "AI Governance: A Research Agenda" by Allan Dafoe

500

The first major international military alliance to adopt AI ethical principles.

What is NATO, with its AI Strategy and principles (October 2021)?

500

The first major theorem to be proved using a computer

What is the four color theorem?

500

ChatGPT's MBTI type according Meyers-Briggs.

What is either ENFJ or INFJ?