Proteins are essential to life, supporting practically all its functions. They are large complex molecules, made up of chains of amino acids, and what a protein does largely depends on its unique 3D structure. Figuring out what shapes proteins fold into is known as the “protein folding problem”, and has stood as a grand challenge in biology for the past 50 years. In a major scientific advance, the latest version of our AI system AlphaFold has been recognised as a solution to this grand challenge by the organisers of the biennial Critical Assessment of protein Structure Prediction (CASP). This breakthrough demonstrates the impact AI can have on scientific discovery and its potential to dramatically accelerate progress in some of the most fundamental fields that explain and shape our world.
From the journal Nature:
DeepMind’s program, called AlphaFold, outperformed around 100 other teams in a biennial protein-structure prediction challenge called CASP, short for Critical Assessment of Structure Prediction. The results were announced on 30 November, at the start of the conference — held virtually this year — that takes stock of the exercise.
“This is a big deal,” says John Moult, a computational biologist at the University of Maryland in College Park, who co-founded CASP in 1994 to improve computational methods for accurately predicting protein structures. “In some sense the problem is solved.”
The full paper is published in Nature: Jumper, J., Evans, R., Pritzel, A. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). https://doi.org/10.1038/s41586-021-03819-2
Deep Mind has released the source on GitHub, including instructions for building a Docker container to run AlphaFold.
One of the faculty at Drexel, where I work, requested AlphaFold be installed. However, in HPC, it is more common to use Singularity containers rather than Docker, as Singularity does not require a daemon nor root privileges. I was able to modify the Dockerfile and the Python wrapper to work with Singularity. Additionally, I added some integration with Slurm, querying the Slurm job environment for the available GPU devices, and the scratch/tmp directory for output. My fork is on GitHub, as well.