Rarely does scientific software spark such sensational headlines. “One of biology’s biggest mysteries ‘largely solved’ by AI”, declared the BBC. Forbes called it “the most important achievement in AI — ever”. The buzz over the November 2020 debut of AlphaFold2, Google DeepMind’s artificial-intelligence (AI) system for predicting the 3D structure of proteins, has only intensified since the tool was made freely available in July.
The excitement relates to the software’s potential to solve one of biology’s thorniest problems — predicting the functional, folded structure of a protein molecule from its linear amino-acid sequence, right down to the position of each atom in 3D space. The underlying physicochemical rules for how proteins form their 3D structures remain too complicated for humans to parse, so this ‘protein-folding problem’ has remained unsolved for decades.
Researchers have worked out the structures of around 160,000 proteins from all kingdoms of life. They have been using experimental techniques, such as X-ray crystallography and cryo-electron microscopy (cryo-EM), and then depositing their 3D information in the Protein Data Bank. Computational biologists have made steady gains in developing software that complements these methods, and have correctly predicted the 3D shapes of some molecules from well-studied protein families.
Despite these advances, researchers still lacked structural information for around 4,800 human proteins. But AlphaFold2 has taken structure-prediction strategies to the next level. For instance, an independent analysis by researchers in Spain showed1 that the algorithm’s predictions had reduced the number of human proteins for which no structural data was available to just 29.
AlphaFold2 was revealed last November at CASP14, the 14th critical assessment of protein structure prediction (CASP), a biennial competition that challenges computational biologists to test their algorithms against proteins for which structures have been experimentally solved, but not publicly released. DeepMind’s software — which uses the sophisticated machine-learning technique known as deep learning — blew the competition out of the water.
“Based on CASP14 [results], they could get about two-thirds of the proteins with experimental accuracy overall, and even for hard targets, they can fold about one-third of the proteins with experimental accuracy,” says Yang Zhang, a biological chemist at the University of Michigan in Ann Arbor, whose algorithm was among CASP14’s runners-up. “That’s a very amazing result.” Two subsequent Nature papers2,3 and dozens of preprints have further demonstrated AlphaFold2’s predictive power.
Zhang considers AlphaFold2 to be a striking demonstration of the power of deep learning, but only a partial solution to the protein-folding problem. The algorithm can deliver highly accurate results for many proteins — and some multi-protein complexes — even in the …….