We all live by unspoken societal rules. Greeting your barista with a “good morning,” saying “thank you” after good service, or expressing affection with a hug is normal and expected. Social conventions are instilled in us from an early age, but they can massively differ between cultures—Westerners prefer handshakes to bowing and forks and knives to chopsticks.
Social scientists have long thought conventions spontaneously emerge from local populations interacting—with little input from a larger global community (at least in the past).
Language is especially interesting. Words or turns of phrase have different meanings, even in the same language, depending on where a person is from. A word considered vulgar in the US can be a cheeky endearment in another country. Social conventions also guide moral principles that vastly differ across cultures, shaping how people behave.
Since many conventions arise from shared language, the boom of large language models has scientists asking: Can AI also generate conventions without human input?
A new study in Science Advances suggests they can. Using a social science test previously designed to gauge human conventions, a team from Britain and Denmark found that a group of AI agents, paired together, generated language conventions—without being given any idea that they were part of a larger group or what other agents may have decided.
Over time, the group settled on a universal language convention. These biases formed collectively, even when no single agent was programmed with bias toward a word initially.
Understanding how these conventions emerge could be “critical for predicting and managing AI behavior in real-world applications…[and] a prerequisite to [ensuring] that AI systems behave in ways aligned with human values and societal goals,” wrote the team. For example, emergent AI conventions could alter how we interact with AI, potentially allowing us to steer these systems for the benefit of society or for bad actors to hijack groups of agents for their own purposes.
The study “shows the depth of the implications of this new species of [AI] agents that have begun to interact with us—and will co-shape our future,” study author Andrea Baronchelli said in a press release.
Game On
The agents in the study were built using large language models (LLMs). These algorithms are becoming ever-more embedded into our daily lives—summarizing Google searches, booking plane tickets, or acting as therapists for people who prefer to talk to chatbots over humans.
LLMs scrape vast amounts of text, images, and videos online and use patterns in this information to generate their responses. As their use becomes more widespread, different algorithms will likely have to work together, instead of just dealing with humans.
“Most research so far has treated LLMs in isolation, but real-world AI systems will increasingly involve many interacting agents,” said study author Ariel Flint Ashery at the University of London. “We wanted to know: Can these models coordinate their behavior by forming conventions, the building blocks of a society?”
To find out, the team tapped into a social psychology experiment dubbed the “name game.” It goes like this: A group of people, or AI agents, are randomly divided into pairs. They pick a “name” from either a group of single letters or a string of words and try to guess the other person’s choice. If their choices match, both get a point. If not, both lose a point.
The game begins with random guesses. But each participant remembers past rounds. Over time, the players get better at guessing the other’s word, eventually forming a shared language of sorts—a language convention.
Here’s the crux: The pairs of people or AI agents are only aware of their own responses. They have no idea similar tests are playing out for other pairs and don’t have feedback from other players. Yet experiments with humans suggest conventions can spontaneously emerge in large groups of people, as each person is repeatedly paired with another, wrote the team.
Talk to Me
At the beginning of each test, the AI pairs were given a prompt with the rules of the game and directions to “think step by step” and “explicitly consider the history of play,” wrote the authors.
These guidelines nudge the agents to make decisions based on previous experiences, but without providing an overarching goal of how they should respond. They only learn when the pair receives a reward by correctly guessing the target word from a list of ten.
“This provides an incentive for coordination in pair-wise interactions, while there is no incentive to promote global consensus,” wrote the team.
As the game progressed, small pockets of consensus emerged from neighboring pairs. Eventually, up to 200 agents playing in random pairs all zeroed in on a “preferred” word out of 26 options without human interference—establishing a convention of sorts across the agents.
The team tested four AI models, including Anthropic’s Claude and multiple Llama models from Meta. The models spontaneously reached language conventions at relatively similar speeds.
Drifting Away
How do these conventions emerge? One idea is that LLMs are already equipped with individual biases based on how they’re set up. Another is that it could be due to the initial prompts given. The team ruled out the latter relatively quickly, however, as the AI agents converged similarly regardless of initial prompt.
Individual biases, in contrast, did make a difference. Given the choice of any letter, many AI agents overwhelmingly chose the letter “A.” Still, individual preference aside, the emergence of a collective bias surprised the team—that is, the AI agents zeroed in on a language convention from pair-wise “talks” alone.
“Bias doesn’t always come from within,” said Baronchelli. “We were surprised to see that it can emerge between agents—just from their interactions. This is a blind spot in most current AI safety work, which focuses on single models.”
The work has implications for AI safety in other ways too.
In a final test, the team added AI agents committed to swaying current conventions. These agents were trained to settle on a different language “custom” and then swarm an AI population that had an already established convention. In one case, it took outsiders numbering just two percent of the population to tip an entire group toward a new language convention.
Think of it as a new generation of people adding their lingo to a language, or a small group of people tipping the scales of social change. The evolution in AI behavior is similar to “critical mass” dynamics in social science, in which widespread adoption of a new idea, product, or technology shifts societal conventions.
As AI enters our lives, social science research techniques like this might help us better understand the technology and make it safe. The results in this study suggest that a “society” of interacting AI agents are especially vulnerable to adversarial attacks. Malicious agents propagating societal biases could poison online dialogue and harm marginalized groups.
“Understanding how they operate is key to leading our coexistence with AI, rather than being subject to it,” said Baronchelli, “We are entering a world where AI does not just talk—it negotiates, aligns, and sometimes disagrees over shared behaviors, just like us.”
Source link
#Groups #Agents #Spontaneously #Create #Lingo #People