Anthropic’s updated Responsible Scaling Policy — AI Alignment Forum
Today we are publishing a significant update to our Responsible Scaling Policy (RSP), the risk governance framework we use to ...
Read more
OpenAI says ChatGPT treats us all the same (most of the time)
Bias in AI is a huge problem. Ethicists have long studied the impact of bias when companies use AI models ...
Read more
The case for unlearning that removes information from LLM weights — AI Alignment Forum
What if you could remove some information from the weights of an AI? Would that be helpful? It is clearly ...
Read more
How to… delete your 23andMe data
23andMe’s business is built on taking saliva samples from its customers. The DNA from those samples is processed and analyzed ...
Read more
The Download: Growing Africa’s food, and deleting your 23andMe data
After falling steadily for decades, the prevalence of global hunger is now on the rise—nowhere more so than in sub-Saharan ...
Read more
Is mechanistic interpretability about to be practically useful? — AI Alignment Forum
Is this market really only at 63%? I think you should take the over. Only 63%? I think you should ...
Read more
This octopus-inspired adhesive can stick to just about anything
Researchers at Virginia Tech set out to re-create this behavior in the lab by pairing a curved rubber stalk with ...
Read more
SAE features for refusal and sycophancy steering vectors — AI Alignment Forum
Steering vectors provide evidence that linear directions in LLMs are interpretable. Since SAEs decompose linear directions, they should be able ...
Read more
Everything comes back to climate tech. Here’s what to watch for next.
I’m watching to see how creative the industry can get with squeezing everything it can out of existing assets. But ...
Read more
My theory of change for working in AI healthtech — AI Alignment Forum
This post starts out pretty gloomy but ends up with some points that I feel pretty positive about. Day to ...
Read more