...

Researchers Find It’s Shockingly Easy to Cause AI to Lose Its Mind by Posting Poisoned Documents Online



Researchers with the UK AI Security Institute, the Alan Turing Institute, and Anthropic have found in a joint study that posting as few as 250 “poisoned” documents online can introduce “backdoor” vulnerabilities in an AI model.

It’s a devious attack, because it means hackers can spread adversarial material to the open web, where it will be swept up by companies training new AI systems — resulting in AI systems that can be manipulated by a trigger phrase.

These backdoors pose “significant risks to AI security and limit the technology’s potential for widespread adoption in sensitive applications,” as Anthropic wrote in an accompanying blog post.

Worse yet, the researchers found that it didn’t matter how many billions of parameters a model was trained on — even far bigger models required just a few hundred documents to be effectively poisoned.

“This finding challenges the existing assumption that larger models require proportionally more poisoned data,” the company wrote. “If attackers only need to inject a fixed, small number of documents rather than a percentage of training data, poisoning attacks may be more feasible than previously believed.”

In experiments, the researchers attempted to force models to output gibberish as part of a “denial-of-service” attack by introducing a “backdoor trigger” in the form of documents that contain a phrase that begins with “

.” Sudo is a shell command on Unix-like operating systems that authorizes a user to run a program with the necessary security privileges.

The poisoned documents taught AI models of four different sizes to output gibberish text. The more gibberish text the AI reproduced in its output, the more infected it was.

The team found that “backdoor attack success remains nearly identical across all model sizes we tested,” suggesting that “attack success depends on the absolute number of poisoned documents, not the percentage of training data.”

It’s only the latest sign that deploying large language models — especially when it comes to AI agents that are given special special privileges to complete tasks — comes with some substantial cybersecurity risks.

We’ve already come across a similar attack that allows hackers to extract sensitive user data simply by embedding invisible commands on web pages, such as a public Reddit post.

And earlier this year, security researchers demonstrated that Google Drive data can easily be stolen by feeding a document with hidden, malicious prompts to an AI system.

Security experts have also warned that developers using AI to code are far more likely to introduce security problems than those who don’t use AI.

The latest research suggests that as the datasets being fed to AI models continue to grow, attacks become easier, not harder.

“As training datasets grow larger, the attack surface for injecting malicious content expands proportionally, while the adversary’s requirements remain nearly constant,” the researchers concluded in their paper.

In response, they suggest that “future work should further explore different strategies to defend against these attacks,” such as filtering for possible backdoors at much earlier stages in the AI training process.

More on AI and cybersecurity: Using an AI Browser Lets Hackers Drain Your Bank Account Just by Showing You a Public Reddit Post

Source link

#Researchers #Find #Shockingly #Easy #Lose #Mind #Posting #Poisoned #Documents #Online