Researchers Find It’s Shockingly Easy to Cause AI to Lose Its Mind by Posting Poisoned Documents Online

Researchers with the UK AI Security Institute, the Alan Turing Institute, and Anthropic have found in a joint study that posting as few as 250 “poisoned” documents online can introduce “backdoor” vulnerabilities in an AI model.

It’s a devious attack, because it means hackers can spread adversarial material to the open web, where it will be swept up by companies training new AI systems — resulting in AI systems that can be manipulated by a trigger phrase.

These backdoors pose “significant risks to AI security and limit the technology’s potential for widespread adoption in sensitive applications,” as Anthropic wrote in an accompanying blog post.

Worse yet, the researchers found that it didn’t matter how many billions of parameters a model was trained on — even far bigger models required just a few hundred documents to be effectively poisoned.

“This finding challenges the existing assumption that larger models require proportionally more poisoned data,” the company wrote. “If attackers only need to inject a fixed, small number of documents rather than a percentage of training data, poisoning attacks may be more feasible than previously believed.”

In experiments, the researchers attempted to force models to output gibberish as part of a “denial-of-service” attack by introducing a “backdoor trigger” in the form of documents that contain a phrase that begins with “

.” Sudo is a shell command on Unix-like operating systems that authorizes a user to run a program with the necessary security privileges.

The poisoned documents taught AI models of four different sizes to output gibberish text. The more gibberish text the AI reproduced in its output, the more infected it was.

The team found that “backdoor attack success remains nearly identical across all model sizes we tested,” suggesting that “attack success depends on the absolute number of poisoned documents, not the percentage of training data.”

It’s only the latest sign that deploying large language models — especially when it comes to AI agents that are given special special privileges to complete tasks — comes with some substantial cybersecurity risks.

We’ve already come across a similar attack that allows hackers to extract sensitive user data simply by embedding invisible commands on web pages, such as a public Reddit post.

And earlier this year, security researchers demonstrated that Google Drive data can easily be stolen by feeding a document with hidden, malicious prompts to an AI system.

Security experts have also warned that developers using AI to code are far more likely to introduce security problems than those who don’t use AI.

The latest research suggests that as the datasets being fed to AI models continue to grow, attacks become easier, not harder.

“As training datasets grow larger, the attack surface for injecting malicious content expands proportionally, while the adversary’s requirements remain nearly constant,” the researchers concluded in their paper.

In response, they suggest that “future work should further explore different strategies to defend against these attacks,” such as filtering for possible backdoors at much earlier stages in the AI training process.

More on AI and cybersecurity: Using an AI Browser Lets Hackers Drain Your Bank Account Just by Showing You a Public Reddit Post

Source link

#Researchers #Find #Shockingly #Easy #Lose #Mind #Posting #Poisoned #Documents #Online

Researchers Find It’s Shockingly Easy to Cause AI to Lose Its Mind by Posting Poisoned Documents Online

Recent Posts

The hyperscalers’ building programmes: How enterprises are affected

European Council and Parliament reach agreement on new payment services regulation

TDS Newsletter: November Must-Reads on GraphRAG, ML Projects, LLM-Powered Time-Series Analysis, and More

Blast from the past: 15 movie gems of 1985

The Dell 14 Plus Just Hit a New Rock Bottom Price (2025)

The Download: The fossil fuel elephant in the room, and better tests for endometriosis

The DualSense Edge has fallen to one of the lowest we’ve seen for Black Friday

6 Best Smart Speakers (2025): Alexa, Google Assistant, Siri

The New “Stranger Things” Is Using Digital De-Aging Tech After Its Child Actors Kept Growing

Amazon Is Having a Huge Black Friday Sale on Birdfy Smart Bird Feeders (2025)