Alphafold
This has been updated for alphafold3 (2-11-2025)
About
Alphafold allows users to predict the 3-D structure of arbitrary proteins. It was published in Nature (Jumper et al. 2021).
We have implemented the use of alphafold on the campus cluster through the use of apptainer and some scripts to help run the container for your particular files. It works best when using gpus to help with the computation.
Before using alphafold
You need to obtain the model parameters directly from google
https://forms.gle/svvpY4u2jsHEwWYS6
When you have them put them on the cluster and set the environment variable MODEL_DIR to the directory where you placed them
Preparing to run
Here we will show how to load the modules for it, create a submission scripts, and submit the job.
To get started you will need to generate the alphafold 3 compatible json file.
Next load the environment modules to put the software in your path:
module load alphafold/3.0.0
This examples assumes you have the json file in a direcotry in your home directory called alphafold. It also assumes you are writing the output files to the scratch dir. You can make the directories like this:
mkdir -p ~/alphafold
mkdir -p /resnick/scratch/$USER/alphafold/out
There is an example json file at /central/software/alphafold/examples which can be copied to your fasta_files direcotry:
cp /central/software9/external/alphafold/examples/test.json ~/alphafold/.
Next you will want to create a submission script. There is an example file available as well which you can copy to your home directory:
cp /central/software9/external/alphafold/examples/alphafold3.sub ~/.
The Submission Script
Next we give the job a name which will show up in the scheduler:
#SBATCH --job-name=alphafold_run
Then we will say how long we want it to run, The job will be killed when it reaches this length. We will start with the maximum time of 1 day, but when you are more comfortable with job runtimes you may want to drop this to a reasonable time. Setting a more realistic time will help keep jobs that are doing the wrong thing from incurring additional costs and will also let you jobs get through the queue faster since it may be able to fit into a backfill slot
#SBATCH --time=1-00:00
The next line chooses the gpu partition which is required to get a gpu
#SBATCH --partition=gpu
The next lines are all about the resources you job will use. It will use one task but that task can use 16 cores. We are requesting asingle h100 gpu on the node and 32G
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --gres=gpu:h100:1 # You need to request one GPU to be able to run AlphaFold properly
#SBATCH --cpus-per-task=16 # adjust this if you are using parallel commands
#SBATCH --mem=32G # adjust this according to the memory requirement per node you need
The next two lines are about having the schedule keep you informed about when the job starts and ends. Make sure to put your actual email address in. You can also not set these if you prefer to not be emailed.
#SBATCH --mail-user=$USER@caltech.edu
#SBATCH --mail-type=ALL
Next we get to what will actually run on the compute node when it runs.
You need to set the directory the the models you have obtained from google are in. This is done with the MODEL_DIR environment variable on our system
export MODEL_DIR=/home/$USER/alphafold3_models/
Next we will load the modules. This is in case you forgot to load them before.
Options
alphafold --helpshort