Schelling game evaluations for AI control — AI Alignment Forum

Schelling game evaluations for AI control — AI Alignment Forum
Playing Schelling games is a key dangerous capability for schemers: it’s much harder to control AIs that are very capable ...
Read more

These are the best ways to measure your body fat

These are the best ways to measure your body fat
I, on the other hand, have never been all that muscular. I like to think I’m a healthy weight—but nurses ...
Read more

Behavioral red-teaming is unlikely to produce clear, strong evidence that models aren’t scheming — AI Alignment Forum

Behavioral red-teaming is unlikely to produce clear, strong evidence that models aren’t scheming — AI Alignment Forum
One strategy for mitigating risk from schemers (that is, egregiously misaligned models that intentionally try to subvert your safety measures) is ...
Read more

Roundtables: Producing Climate-Friendly Food | MIT Technology Review

Roundtables: Producing Climate-Friendly Food | MIT Technology Review
The latest iteration of a legacy Founded at the Massachusetts Institute of Technology in 1899, MIT Technology Review is a ...
Read more

Safe Predictive Agents with Joint Scoring Rules — AI Alignment Forum

Safe Predictive Agents with Joint Scoring Rules — AI Alignment Forum
Thanks to Evan Hubinger for funding this project and for introducing me to predictive models, Johannes Treutlein for many fruitful ...
Read more

Preventing Climate Change: A Team Sport

Preventing Climate Change: A Team Sport
Read more from MIT Technology Review Insights & MEDC about addressing climate change impacts About the speaker Hilary Doe, Chief ...
Read more

The Download: Another Nobel Prize for AI, and Adobe’s anti-scraping tool

The Download: Another Nobel Prize for AI, and Adobe’s anti-scraping tool
Google DeepMind founder Demis Hassabis has won a joint Nobel Prize for Chemistry for using artificial intelligence to predict the ...
Read more

Adobe wants to make it easier for artists to blacklist their work from AI scraping

Adobe wants to make it easier for artists to blacklist their work from AI scraping
Content credentials are based on C2PA, an internet protocol that uses cryptography to securely label images, video, and audio with ...
Read more

Two new datasets for evaluating political sycophancy in LLMs — AI Alignment Forum

Behavioral red-teaming is unlikely to produce clear, strong evidence that models aren’t scheming — AI Alignment Forum
TLDR: I created two datasets (154 and 759 statements) that can aid in measuring political sycophancy (in the US in ...
Read more

a plan to deal with AI extinction risk — AI Alignment Forum

a plan to deal with AI extinction risk — AI Alignment Forum
We have published A Narrow Path: our best attempt to draw out a comprehensive plan to deal with AI extinction ...
Read more