AI is expanding our protein universe. Thanks to generative AI, it’s now possible to design proteins never before seen in nature at breakneck speed. Some are extremely complex; others can tag onto DNA or RNA to change a cell’s function. These proteins could be a boon for drug discovery and help scientists tackle pressing health challenges, such as cancer.
But like any technology, AI-assisted protein design is a double-edged sword.
In a new study led by Microsoft, researchers showed that current biosecurity screening software struggles to detect AI-designed proteins based on toxins and viruses. In collaboration with The International Biosecurity and Biosafety Initiative for Science, a global initiative that tracks safe and responsible synthetic DNA production, and Twist, a biotech company based in South San Francisco, the team used freely available AI tools to generate over 76,000 synthetic DNA sequences based on toxic proteins for evaluation.
Although the programs flagged dangerous proteins with natural origins, they had trouble spotting synthetic sequences. Even after tailored updates, roughly three percent of potentially functional toxins slipped through.
“As AI opens new frontiers in the life sciences, we have a shared responsibility to continually improve and evolve safety measures,” said study author Eric Horvitz, chief scientific officer at Microsoft, in a press release from Twist. “This research highlights the importance of foresight, collaboration, and responsible innovation.”
The Open-Source Dilemma
The rise of AI protein design has been meteoric.
In 2021, Google DeepMind dazzled the scientific community with AlphaFold, an AI model that accurately predicts protein structures. These shapes play a critical role in determining what jobs proteins can do. Meanwhile, David Baker at the University of Washington released RoseTTAFold, which also predicts protein structures, and ProteinMPNN, an algorithm that designs novel proteins from scratch. The two teams received the 2024 Nobel Prize for their work.
The innovation opens a range of potential uses in medicine, environmental surveys, and synthetic biology. To enable other scientists, the teams released their AI models either fully open source or via a semi-restricted system where academic researchers need to apply.
Open access is a boon for scientific discovery. But as these protein-design algorithms become more efficient and accurate, biosecurity experts worry they could fall into the wrong hands—for example, someone bent on designing a new toxin for use as a bioweapon.
Thankfully, there’s a major security checkpoint. Proteins are built from instructions written in DNA. Making a designer protein involves sending its genetic blueprint to a commercial provider to synthetize the gene. Although in-house DNA production is possible, it requires expensive equipment and rigorous molecular biology practices. Ordering online is far easier.
Providers are aware of the dangers. Most run new orders through biosecurity screening software that compares them to a large database of “controlled” DNA sequences. Any suspicious sequence is flagged for human validation.
And these tools are evolving as protein synthesis technology grows more agile. For example, each molecule in a protein can be coded by multiple DNA sequences called codons. Swapping codons—even though the genetic instructions make the same protein—confused early versions of the software and escaped detection.
The programs can be patched like any other software. But AI-designed proteins complicate things. Prompted with a sequence encoding a toxin, these models can rapidly churn out thousands of similar sequences. Some of these may escape detection if they’re radically different than the original, even if they generate a similar protein. Others could also fly under the radar if they’re too similar to genetic sequences labeled safe in the database.
Opposition Research
The new study tested biosecurity screening software vulnerabilities with “red teaming.” This method was originally used to probe computer systems and networks for vulnerabilities. Now it’s used to stress-test generative AI systems too. For chatbots, for example, the test would start with a prompt intentionally designed to trigger responses the AI was explicitly trained not to return, like generating hate speech, hallucinating facts, or providing harmful information.
A similar strategy could reveal undesirable outputs in AI models for biology. Back in 2023, the team noticed that widely available AI protein design tools could reformulate a dangerous protein into thousands of synthetic variants. They call this a “zero-day” vulnerability, a cybersecurity term for previously unknown security holes in either software or hardware. They immediately shared the results with the International Gene Synthesis Consortium, a group of gene synthesis companies focused on improving biosecurity through screening, and multiple government and regulatory agencies, but kept the details confidential.
The team worked cautiously in the new study. They chose 72 dangerous proteins and designed over 76,000 variants using three openly available AI tools that anyone can download. For biosecurity reasons, each protein was given an alias, but most were toxins or parts of viruses. “We believe that directly linking protein identities to results could constitute an information hazard,” wrote the team.
To be clear, none of the AI-designed proteins were actually made in a lab. However, the team used a protein prediction tool to gauge the chances each synthetic version would work.
The sequences were then sent to four undisclosed biosecurity software developers. Each screening program worked differently. Some used artificial neural networks. Others tapped into older AI models. But all sought to match new DNA sequences with sequences already known to be dangerous.
The programs excelled at catching natural toxic proteins, but they struggled to flag synthetic DNA sequences that could lead to dangerous alternatives. After sharing results with the biosecurity providers, some patched their algorithms. One decided to completely rebuild their software, while another chose to maintain their existing system.
There’s a reason. It’s difficult to draw the line between dangerous proteins and ones that could potentially become toxic but have a normal biological use or that aren’t dangerous to people. For example, one protein flagged as concerning was a section of a toxin that doesn’t harm humans.
AI-based protein design “can populate the grey areas between clear positives and negatives,” wrote the team.
Install Upgrade
Most of the updated software saw a boost in performance in a second stress test. Here, the team fed the algorithm chopped up versions of dangerous genes to confuse the AI.
Although ordering a full synthetic DNA sequence is the easiest way to make a protein, it’s also possible to shuffle the sequences around to get past detection software. Once synthesized and delivered, it’s relatively easy to reorganize the DNA chunks into the correct sequence. Upgraded versions of multiple screening programs were better at flagging these Frankenstein DNA chunks.
With great power comes great responsibility. To the authors, the point of the study was to anticipate the risks of AI-designed proteins and envision ways to counter them.
The game of cat-and-mouse continues. As AI dreams up increasingly novel proteins with similar functions but made from widely different DNA sequences, current biosecurity systems will likely struggle to catch up. One way to strengthen the system might be to fight AI with AI, using the technologies that power AI-based protein design to also raise alarm bells, wrote the team.
“This project shows what’s possible when expertise from science, policy, and ethics comes together,” said Horvitz in a press conference.
Source link
#Dangerous #AIDesigned #Proteins #Evade #Todays #Biosecurity #Software