Anthropic’s updated Responsible Scaling Policy — AI Alignment Forum

Anthropic’s updated Responsible Scaling Policy — AI Alignment Forum
Today we are publishing a significant update to our Responsible Scaling Policy (RSP), the risk governance framework we use to ...
Read more

OpenAI says ChatGPT treats us all the same (most of the time)

OpenAI says ChatGPT treats us all the same (most of the time)
Bias in AI is a huge problem. Ethicists have long studied the impact of bias when companies use AI models ...
Read more

The case for unlearning that removes information from LLM weights — AI Alignment Forum

The case for unlearning that removes information from LLM weights — AI Alignment Forum
What if you could remove some information from the weights of an AI? Would that be helpful? It is clearly ...
Read more

How to… delete your 23andMe data

How to… delete your 23andMe data
23andMe’s business is built on taking saliva samples from its customers. The DNA from those samples is processed and analyzed ...
Read more

The Download: Growing Africa’s food, and deleting your 23andMe data

The Download: Growing Africa’s food, and deleting your 23andMe data
After falling steadily for decades, the prevalence of global hunger is now on the rise—nowhere more so than in sub-Saharan ...
Read more

Is mechanistic interpretability about to be practically useful? — AI Alignment Forum

Is mechanistic interpretability about to be practically useful? — AI Alignment Forum
Is this market really only at 63%? I think you should take the over.  Only 63%? I think you should ...
Read more

This octopus-inspired adhesive can stick to just about anything

This octopus-inspired adhesive can stick to just about anything
Researchers at Virginia Tech set out to re-create this behavior in the lab by pairing a curved rubber stalk with ...
Read more

SAE features for refusal and sycophancy steering vectors — AI Alignment Forum

SAE features for refusal and sycophancy steering vectors — AI Alignment Forum
Steering vectors provide evidence that linear directions in LLMs are interpretable. Since SAEs decompose linear directions, they should be able ...
Read more

Everything comes back to climate tech. Here’s what to watch for next.

Everything comes back to climate tech. Here’s what to watch for next.
I’m watching to see how creative the industry can get with squeezing everything it can out of existing assets. But ...
Read more

My theory of change for working in AI healthtech — AI Alignment Forum

My theory of change for working in AI healthtech — AI Alignment Forum
This post starts out pretty gloomy but ends up with some points that I feel pretty positive about.  Day to ...
Read more