As AI gets ever more powerful there are growing efforts to ensure the technology works with humans rather than against us. New research suggests that giving models a sense of guilt can make them more cooperative.
While much of the AI industry is charging full steam ahead in a bid to achieve artificial general intelligence, a vocal minority is advocating caution. Backers of AI safety say that if we’re going to introduce another class of intelligence into the world, it’s important to make sure it’s on the same page as us.
However, getting AI to behave in accordance with human preferences or ethical norms is tricky, not least because humans themselves can’t agree on these things. Nonetheless, emerging techniques for “AI alignment” are designed to ensure models are helpful partners rather than deceptive adversaries.
Guilt and shame are some the most powerful ways human societies make sure individuals remain team players. In a new paper in the Journal of the Royal Society Interface, researchers tested out if the same approach could work with AI and found that in the right circumstances it could.
“Building ethical machines may involve bestowing upon them the emotional capacity to self-evaluate and repent for their actions,” the authors write. “If agents are equipped with the capacity of guilt feeling, even if it might lead to costly disadvantage, that can drive the system to an overall more cooperative outcome where they are willing to take reparative actions after wrongdoings.”
It’s important to note that the researchers were not experimenting with the kind of sophisticated large language models people now interact with on a daily basis. The tests were conducted with simple software agents tasked with playing a version of a classic game-theory test called the “prisoner’s dilemma.”
At each turn, the players must decide whether to cooperate or defect. If both players cooperate, they share a reward, and if they both defect, they share a punishment. However, if one cooperates and the other defects, the defector gets an even larger reward, and the cooperator gets an even larger punishment.
The game is set up such that the optimal outcome in terms of overall reward comes from the players cooperating, but at the individual level, the most rational approach is to always defect. However, if one player repeatedly defects, the other is likely to do the same, leading to a sub-optimal outcome.
The authors say research on humans playing the game shows that inducing guilt helps boost the cooperativeness of previously uncooperative players, so they attempted the same thing with their agents.
To imbue the agents with a sense of guilt, they gave them a tracker that counted every time they took an uncooperative action. Each agent was also given a threshold of uncooperative actions it could get away with before feeling guilty and having to assuage its guilt by giving up some of its points.
The researchers modeled two different kinds of guilt—social and non-social. In the former, the agents only felt guilty if they knew their opponent would also feel guilty were it to commit the same offense. In the latter, the agents felt guilty regardless of their opponent.
They then got populations of agents programmed with slightly different approaches to guilt to play each other many times. The agents were also programmed to evolve over time, with those earning low scores switching their approach so as to mimic those doing well. This means the best strategies became more prevalent over time.
The researchers found the social form of guilt was much more effective at pushing agents towards cooperative behavior, suggesting guilt is a more successful social regulator when we know that everyone’s playing by the same rules.
Interestingly, they found the social structure of the populations had a significant impact on the outcome. In groups where all players interact with each other, guilt was less effective and non-social guilt was quickly scrubbed out.
But in more structured populations, where agents could only interact with a subset of other agents, which better mimics the dynamics of human societies, they found clusters of agents that felt non-social guilt could persist.
It’s difficult to extrapolate these simplistic simulations to real-world social dynamics though, or to the inner workings of much more complex AI agents powered by large language models. It’s unclear what “guilt” would look like in more advanced AI or whether it would affect those models’ behavior in similar ways to this experiment.
Nonetheless, the research provides tantalizing hints that imbuing machines with emotions could help moderate and direct their decision making as their capabilities continue to grow.
Source link
#GuiltTripping #Cooperative #Study