Skip to content

Warning

This is as a Lab Notebook which describes how to solve a specific problem at a specific time. Please keep this in mind as you read and use the content. Please pay close attention to the date, version information and other details

Lab Notebook --- Instructions for AlphaFold version 2.3.2, Singularity (2023-11-10)(WORK IN PROGRESS)

Currently, due to certain system limitations, AlphaFold version 2.3.0 and later cannot be installed on HPCC or run as Docker images.

Therefore, we are working to make Singularity images created by third-party user available on HPCC (see https://github.com/prehensilecode/alphafold_singularity). As of the writing of this Lab Notebook, the most current image is 2.3.2-1

These instructions are a work in progress for running AlphaFold version 2.3.2 using the singularity container found at:

1
/opt/software/alphafold/2.3.2/alphafold_2.3.2-1.sif

As with other containers in the /opt/software/alphafold/ directory, AlphaFold 2.3.2 can be run via Singularity.

Howevever, AlphaFold version after 2.3.0 use a database which is formatted differently than pevious versions. This database is located in /mnt/research/common-data/alphafold/database_230.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
database_230
── bfd -> ../database/bfd/
├── mgnify
│   ├── mgy_clusters_2022_05.fa
│   └── mgy_clusters.fa -> mgy_clusters_2022_05.fa
├── params
│   ├── LICENSE
│   ├── params_model_1_multimer_v3.npz
│   ├── params_model_1.npz
│   ├── params_model_1_ptm.npz
│   ├── params_model_2_multimer_v3.npz
│   ├── params_model_2.npz
│   ├── params_model_2_ptm.npz
│   ├── params_model_3_multimer_v3.npz
│   ├── params_model_3.npz
│   ├── params_model_3_ptm.npz
│   ├── params_model_4_multimer_v3.npz
│   ├── params_model_4.npz
│   ├── params_model_4_ptm.npz
│   ├── params_model_5_multimer_v3.npz
│   ├── params_model_5.npz
│   └── params_model_5_ptm.npz
├── pdb70
│   ├── md5sum
│   ├── pdb70_a3m.ffdata
│   ├── pdb70_a3m.ffindex
│   ├── pdb70_clu.tsv
│   ├── pdb70_cs219.ffdata
│   ├── pdb70_cs219.ffindex
│   ├── pdb70_hhm.ffdata
│   ├── pdb70_hhm.ffindex
│   └── pdb_filter.dat
├── pdb_mmcif
│   ├── mmcif_files
│      ├── 100d.cif
│      ├── 101d.cif
│      ├── 101m.cif
│      ├── 102d.cif
│      ├── 102l.cif
│   |   ...
│   └── obsolete.dat
├── pdb_seqres
│   └── pdb_seqres.txt
├── small_bfd -> ../database/small_bfd/
├── uniprot
│   └── uniprot.fasta
├── uniref30
│   ├── UniRef30_2021_03_a3m.ffdata
│   ├── UniRef30_2021_03_a3m.ffindex
│   ├── UniRef30_2021_03_cs219.ffdata
│   ├── UniRef30_2021_03_cs219.ffindex
│   ├── UniRef30_2021_03_hhm.ffdata
│   ├── UniRef30_2021_03_hhm.ffindex
│   ├── UniRef30_2021_03.md5sums
│   └── UniRef30_2021_03.tar.1.gz
└── uniref90
    └── uniref90.fasta

Runninging AlphaFold 2.3.2 using run_singluarity.py

To run AlphaFold 2.3.2, first load the following modules:

1
2
module load GCC/6.4.0-2.28  OpenMPI/2.1.2 Python/3.6.4 
module load alphafold/2.3.2

The alphafold/2.3.2 module will set the ALPHAFOLD_DIR, ALPHAFOLD_DATADIR, and ALPHAFOLD_MODELS environment variables for you.

We recommend you use the python script "run_singularity.py" (which is also in /opt/software/alphafold/2.3.2/) to work with the Singularity image. This script helps automate many of the more challenging parts of using the image, such as correctly binding paths to your data directories and enabling GPU support. Below is an example of how to run this script:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
export output_dir=<some_output_folder> # Set the output directory as a enviroment variable as this is what the script expects

python3 ${ALPHAFOLD_DIR}/run_singularity.py \ 
    --use_gpu \ #Use the GPU, which makes the neural network calculations faster
    --output_dir=$output_dir \ #Here is where I want to put the result
    --data_dir=${ALPHAFOLD_DATADIR} \ #Here is where the AlphaFold data like pdb sequences live
    --fasta_paths=input.fasta \ #Here is our input fasta sequence
    --max_template_date=2020-05-14 \ #When looking for PDB templates, this is the maximum date we will consider
    --model_preset=monomer \ #We are predicting a monomeric protein, change to "multimer" for multimer 
    --db_preset=reduced_dbs #Use the reduced database

Note that any of the flags normally passed to AlphaFold should be able to be passed through this script.

Additionally, the --output_dir argument must be passed EXACTLY as above using an enviroment variable because the script expects an enviroment variable (i.e. the uses os.environ to fill in the outdir)

If you would like to submit an AlphaFold job to SLURM, we have included an example script below. Note that you will need to adust the resource requests (mainly time) depending on the complexity of your protein.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#!/bin/bash
#SBATCH --job-name 2023alphafold
#SBATCH --time=04:00:00
#SBATCH --gres=gpu:1
#SBATCH -C [nvf|nal|nif] ## We want the good GPUs
#SBATCH --cpus-per-task=8
#SBATCH --mem=12G
#SBATCH -o 2023.log

module load GCC/6.4.0-2.28  OpenMPI/2.1.2 Python/3.6.4
module load alphafold/2.3.2

echo "Export AlphaFold variables"
# These variables are now set by the module
echo INFO: ALPHAFOLD_DIR=$ALPHAFOLD_DIR
echo INFO: ALPHAFOLD_DATADIR=$ALPHAFOLD_DATADIR

export output_dir=$SLURM_SUBMIT_DIR/2023 # you chnage this to whatever path you like

cd $SLURM_SUBMIT_DIR
mkdir -p $output_dir
timestamp=$(date)
echo "Starting AlphaFold at $timestamp"

python3 ${ALPHAFOLD_DIR}/run_singularity.py \
    --use_gpu \
    --output_dir=$output_dir \
    --data_dir=${ALPHAFOLD_DATADIR} \
    --fasta_paths=8IBQ.fasta \
    --max_template_date=2023-08-01 \
    --model_preset=monomer \
    --db_preset=reduced_dbs

echo INFO: AlphaFold returned $?

timestamp=$(date)
echo "Finishing AlphaFold at $timestamp"

Running the singularity image manually

If for whatever reason you need to manually run AlphaFold from the singularity image, we recommend you still run "run_singularity.py" first as this script will print the "singularity run ..." command it generates. It will be much easier to work from this command, which should have all of the bind paths properly set for the image, than to try to write your own command from scratch.

Additional Resources and Acknowledgement

I would like to thank Dr. Josh Vermaas for helping me troubleshoot this new image and providing the example SLURM script.

For additional details about the Singularity image and run_singularity.py script, please see the Github of the original author:

https://github.com/prehensilecode/alphafold_singularity