AI can now create a replica of your personality

AI can now create a replica of your personality
Led by Joon Sung Park, a Stanford PhD student in computer science, the team recruited 1,000 people who varied by ...
Read more

Why imperfect adversarial robustness doesn’t doom AI control — AI Alignment Forum

Why imperfect adversarial robustness doesn’t doom AI control — AI Alignment Forum
(thanks to Alex Mallen, Cody Rushing, Zach Stein-Perlman, Hoagy Cunningham, Vlad Mikulik, and Fabien Roger for comments) Sometimes I hear ...
Read more

Who’s to blame for climate change? It’s surprisingly complicated.

Who’s to blame for climate change? It’s surprisingly complicated.
Even then, though, there’s another factor to consider: population. Dividing a country’s total emissions by its population reveals how the ...
Read more

AI Might Seek Power for Power’s Sake — AI Alignment Forum

AI Might Seek Power for Power’s Sake — AI Alignment Forum
I think AI agents (trained end-to-end) might intrinsically prefer power-seeking, in addition to whatever instrumental drives they gain.  The logical ...
Read more

The Download: police AI, and mixed reality’s future

The Download: police AI, and mixed reality’s future
This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world ...
Read more

Training AI agents to solve hard problems could lead to Scheming — AI Alignment Forum

Why imperfect adversarial robustness doesn’t doom AI control — AI Alignment Forum
TLDR: We want to describe a concrete and plausible story for how AI models could become schemers. We aim to base ...
Read more

The Download: Bluesky’s rapid rise, and harmful fertility stereotypes

The Download: Bluesky’s rapid rise, and harmful fertility stereotypes
I won’t spoil the movie for anyone who hasn’t seen it yet (although I should warn that it is not ...
Read more

LLMs make inferences about procedural training data leveraging declarative facts in earlier training data — AI Alignment Forum

LLMs make inferences about procedural training data leveraging declarative facts in earlier training data — AI Alignment Forum
Cross-context abduction: LLMs make inferences about procedural training data leveraging declarative facts in earlier training data — AI Alignment Forum ...
Read more

Which evals resources would be good? — AI Alignment Forum

Why imperfect adversarial robustness doesn’t doom AI control — AI Alignment Forum
I want to make a serious effort to create a bigger evals field. I’m very interested in which resources you ...
Read more

Win/continue/lose scenarios and execute/replace/audit protocols — AI Alignment Forum

Win/continue/lose scenarios and execute/replace/audit protocols — AI Alignment Forum
In this post, I’ll make a technical point that comes up when thinking about risks from scheming AIs from a ...
Read more
12313 Next