What is inverse reinforcement learning?
2016 paper introduced the concept of "corrigibility" in AI systems
What is "Corrigible Artificial Intelligence" by Soares, Fallenstein, Yudkowsky, and Armstrong.
This Anthropic researcher has the first role of its kind.
What is "Model Welfare"?
A 1997 chess match between an AI and a world chess champion
What is Deep Blue vs Garry Kasparov?
a fictional world and civilization invented by Eliezer Yudkowsky
Dath Ilan
The full expression for G.O.W.A.W.
What is Go Out With A Whimper?
Which 2015 paper first introduced the concept of "concrete problems in AI safety"?
What is "Concrete Problems in AI Safety" by Dario Amodei, Chris Olah, et al.?
AI research trying to help individuals, humans and machines, to find ways to improve their joint welfare.
What is Cooperative AI?
A 2016 Go match between an AI and a top Go player
What is AlphaGo vs Lee Sedol?
"...most people may not realize how much of this entire field is myself wearing various ______." -- Eliezer, List of Lethalities comment
What are trenchcoats?
This Governor vetoed the first major US legislation on AI safety in 2024.
Who is Gavin Newsom?
This 2019 paper first introduced the concept of "mesa-optimization" in AI systems.
What is "Risks from Learned Optimization in Advanced Machine Learning Systems" by Hubinger, van Merwijk, Mikulik, Skalse, and Garrabrant
This technique is applied to shrimp eyestalks (and neural nets)
What is ablation?
The CNN that won the 2012 ImageNet Challenge
What is AlexNet?
What is one hundred billion (10^11)
This country was the first to publish a national AI strategy in 2017.
What is Canada?
In 2016 this organization published the first technical report on "Scalable Oversight" in AI systems.
OpenAI, in their paper "Learning from Human Preferences"
A thought experiment demonstrating that an updateless decision theory using logical counterfactuals is better than an evidential updateless decision theory
What is the Troll Bridge?
The three labs that signed the White House voluntary commitments but are not members of the Frontier Model Forum
What are Amazon, Inflection, and Meta?
The Lovecraftian race that created the Shoggoth
What are The Elder Things?
The two documents named as inspiration for Claude's constitution
What are the UN Declaration of Human Rights and Apple's Terms of Service?
What 2020 paper introduced the concept of "constitutional AI" and its potential role in alignment?
What is "AI Governance: A Research Agenda" by Allan Dafoe
The first major international military alliance to adopt AI ethical principles.
What is NATO, with its AI Strategy and principles (October 2021)?
The first major theorem to be proved using a computer
What is the four color theorem?
ChatGPT's MBTI type according Meyers-Briggs.
What is either ENFJ or INFJ?