Vast swathes of the human genome remain a mystery to science. A new AI from Google DeepMind is helping researchers understand how these stretches of DNA impact the activity of other genes.
While the Human Genome Project produced a complete map of our DNA, we still know surprisingly little about what most of it does. Roughly 2 percent of the human genome encodes specific proteins, but the purpose of the other 98 percent is much less clear.
Historically, scientists called this part of the genome “junk DNA.” But there’s growing recognition these so-called “non-coding” regions play a critical role in regulating the expression of genes elsewhere in the genome.
Teasing out these interactions is a complicated business. But now a new Google DeepMind model called AlphaGenome can take long stretches of DNA and make predictions about how different genetic variants will affect gene expression, as well as a host of other important properties.
“We have, for the first time, created a single model that unifies many different challenges that come with understanding the genome,” Pushmeet Kohli, a vice president for research at DeepMind, told MIT Technology Review.
The so-called “sequence to function” model uses the same transformer architecture as the large language models behind popular AI chatbots. The model was trained on public databases of experimental results testing how different sequences impact gene regulation. Researchers can enter a DNA sequence of up to one million letters, and the model will then make predictions about a wide range of molecular properties impacting the sequence’s regulatory activity.
These include things like where genes start and end, which sections of the DNA are accessible or blocked by certain proteins, and how much RNA is being produced. RNA is the messenger molecule responsible for carrying the instructions contained in DNA to the cell’s protein factories, or ribosomes, as well as regulating gene expression.
AlphaGenome can also assess the impact of mutations in specific genes by comparing variants, and it can make predictions about RNA “splicing”—a process where RNA molecules are chopped up and packaged before being sent off to a ribosome. Errors in this process are responsible for rare genetic diseases, such as spinal muscular atrophy and some forms of cystic fibrosis.
Predicting the impact of different genetic variants could be particularly useful. In a blog post, the DeepMind researchers report they used the model to predict how mutations other scientists had discovered in leukemia patients probably activated a nearby gene known to play a role in cancer.
“This system pushes us closer to a good first guess about what any variant will be doing when we observe it in a human,” Caleb Lareau, a computational biologist at Memorial Sloan Kettering Cancer Center granted early access to AlphaGenome, told MIT Technology Review.
The model will be free for noncommercial purposes, and DeepMind has committed to releasing full details of how it was built in the future. But it still has limitations. The company says the model can’t make predictions about the genomes of individuals, and its predictions don’t fully explain how genetic variations lead to complex traits or diseases. Further, it can’t accurately predict how non-coding DNA impacts genes that are located more than 100,000 letters away in the genome.
Anshul Kundaje, a computational genomicist at Stanford University in Palo Alto, California, who had early access to AlphaGenome, told Nature that the new model is an exciting development and significantly better than previous models, but not a slam dunk. “This model has not yet ‘solved’ gene regulation to the same extent as AlphaFold has, for example, protein 3D-structure prediction,” he says.
Nonetheless, the model is an important breakthrough in the effort to demystify the genome’s “dark matter.” It could transform our understanding of disease and supercharge synthetic biologists’ efforts to re-engineer DNA for our own purposes.
Source link
#Google #Work #DNA #Body