Search open search form

Alphafold

This has been updated for alphafold3 (2-11-2025)


About

Alphafold  allows users to predict the 3-D structure of arbitrary proteins. It was published in Nature (Jumper et al. 2021). 

We have implemented the use of alphafold on the campus cluster through the use of apptainer and some scripts to help run the container for your particular files. It works best when using gpus to help with the computation.

Before using alphafold

You need to obtain the model parameters directly from google

   https://forms.gle/svvpY4u2jsHEwWYS6

When you have them put them on the cluster and set the environment variable MODEL_DIR to the directory where you placed them

Preparing to run

Here we will show how to load the modules for it, create a submission scripts, and submit the job.

To get started you will need to generate the alphafold 3 compatible json file.

Next  load the environment modules to put the software in your path:

module load alphafold/3.0.0


This examples assumes you have the json file in  a direcotry in your home directory called alphafold. It also assumes you are writing the output files to the scratch dir. You can make the directories like this:

mkdir -p ~/alphafold

mkdir -p /resnick/scratch/$USER/alphafold/out

There is an example json file at /central/software/alphafold/examples which can be copied to your fasta_files direcotry:

cp /central/software9/external/alphafold/examples/test.json ~/alphafold/.


Next you will want to create a submission script. There is an example file available as well which you can copy to your home directory:

cp  /central/software9/external/alphafold/examples/alphafold3.sub ~/.


The Submission Script

We will go through the script line by line so you understand what it is doing.

The script will always start like a normal shell script would.  typically calling bash:

#!/bin/bash

Any line that starts with #SBATCH is an instruction to the scheduler.  This tells the scheduler what resources you need and can set various things.  These options will be superceded by anything sent on the command line when submitting

Next we give the job a name which will show up in the scheduler:

#SBATCH --job-name=alphafold_run

Then we will say how long we want it to run,  The job will be killed when it reaches this length. We will start with the maximum time of 1 day, but when you are more comfortable with job runtimes you may want to drop this to a reasonable time.  Setting a more realistic time will help keep jobs that are doing the wrong thing from incurring additional costs and will also let you jobs get through the queue faster since it may be able to fit into a backfill slot

#SBATCH --time=1-00:00

The next line chooses the gpu partition which is required to get a gpu

#SBATCH --partition=gpu

The next lines are all about the resources you job will use.  It will use one task but that task can use 16 cores. We are requesting asingle h100 gpu on the node and 32G

#SBATCH --nodes=1

#SBATCH --ntasks=1

#SBATCH --gres=gpu:h100:1           # You need to request one GPU to be able to run AlphaFold properly

#SBATCH --cpus-per-task=16     # adjust this if you are using parallel commands

#SBATCH --mem=32G              # adjust this according to the memory requirement per node you need


The next two lines are about having the schedule keep you informed about when the job starts and ends.  Make sure to put your actual email address in. You can also not set these if you prefer to not be emailed.

#SBATCH --mail-user=$USER@caltech.edu

#SBATCH --mail-type=ALL


Next we get to what will actually run on the compute node when it runs.

You need to set the directory the the models you have obtained from google are in.  This is done with the MODEL_DIR environment variable on our system

export MODEL_DIR=/home/$USER/alphafold3_models/

Next we will load the modules.  This is in case you forgot to load them before.

module load  alphafold/3.0.0

The we run the wrapper script for alphafold which will launch the alphafold container via singularity and pass it the options you set and use some defaults that weren't set by the end user. We use the time command at the beginning just to know how long the process took.

time alphafold --output_dir=/resnick/scratch/$USER/alphafold/out --json_path=/home/$USER/alphafold/test.json

Options

You can see what other options beyond the basics by running the following:

alphafold --helpshort

Submitting your job

Submitting your job is quite simple.  Now that you have a submission script with whatever options you want, simply submit it with the sbatch command and it will give you your job id:

[naveed@login4 ~]$ sbatch alphafold3.sub
Submitted batch job 47261946

Once running, there will be an job output file in your working directory (in this case probably your home directory) called something like this:  slurm-47261946.out

and the job output files will be in the ouput directory previous set up in your submission script