Warning
This is as a Lab Notebook which describes how to solve a specific problem at a specific time. Please keep this in mind as you read and use the content. Please pay close attention to the date, version information and other details.
Extra
The 2.3.1 version of the AlphaFold singularity image is currently non-functional on HPCC. As of Ocotober, 2023, the image is no longer able to find the current libcusolver share library file. Please use the 2.3.2 version of the image from this point forward. Please see the new documentation
Lab Notebook --- Instructions for AlphaFold version 2.3.1, Singulairty (2023-07-18)(DEPCREATED, see warnings)
These instructions are a work in progression for running AlphaFold version 2.3.1 using the singularity container found at:
/opt/software/alphafold/2.3.1/alphafold_2.3.1_latest.sif
As with other containers in the /opt/software/alphafold/
directory, AlphaFold 2.3.1 can be run via Singularity.
Howevever, AlphaFold version after 2.3.0 use a database which is formatted differently than pevious versions. This database is located in
/mnt/research/common-data/alphafold/database_230
.
database_230
── bfd -> ../database/bfd/
├── mgnify
│ ├── mgy_clusters_2022_05.fa
│ └── mgy_clusters.fa -> mgy_clusters_2022_05.fa
├── params
│ ├── LICENSE
│ ├── params_model_1_multimer_v3.npz
│ ├── params_model_1.npz
│ ├── params_model_1_ptm.npz
│ ├── params_model_2_multimer_v3.npz
│ ├── params_model_2.npz
│ ├── params_model_2_ptm.npz
│ ├── params_model_3_multimer_v3.npz
│ ├── params_model_3.npz
│ ├── params_model_3_ptm.npz
│ ├── params_model_4_multimer_v3.npz
│ ├── params_model_4.npz
│ ├── params_model_4_ptm.npz
│ ├── params_model_5_multimer_v3.npz
│ ├── params_model_5.npz
│ └── params_model_5_ptm.npz
├── pdb70
│ ├── md5sum
│ ├── pdb70_a3m.ffdata
│ ├── pdb70_a3m.ffindex
│ ├── pdb70_clu.tsv
│ ├── pdb70_cs219.ffdata
│ ├── pdb70_cs219.ffindex
│ ├── pdb70_hhm.ffdata
│ ├── pdb70_hhm.ffindex
│ └── pdb_filter.dat
├── pdb_mmcif
│ ├── mmcif_files
│ │ ├── 100d.cif
│ │ ├── 101d.cif
│ │ ├── 101m.cif
│ │ ├── 102d.cif
│ │ ├── 102l.cif
│ | ...
│ └── obsolete.dat
├── pdb_seqres
│ └── pdb_seqres.txt
├── small_bfd -> ../database/small_bfd/
├── uniprot
│ └── uniprot.fasta
├── uniref30
│ ├── UniRef30_2021_03_a3m.ffdata
│ ├── UniRef30_2021_03_a3m.ffindex
│ ├── UniRef30_2021_03_cs219.ffdata
│ ├── UniRef30_2021_03_cs219.ffindex
│ ├── UniRef30_2021_03_hhm.ffdata
│ ├── UniRef30_2021_03_hhm.ffindex
│ ├── UniRef30_2021_03.md5sums
│ └── UniRef30_2021_03.tar.1.gz
└── uniref90
└── uniref90.fasta
Before running AlphaFold 2.3.1, you need to set
export ALPHAFOLD_DATA_PATH="/mnt/research/common-data/alphafold/database"
export ALPHAFOLD_MODELS="/mnt/research/common-data/alphafold/database/params"
We also recommend disabling local Python libraries with the following argument to avoid conflicts with the Python installation within the Singularity container
export PYTHONNOUSERSITE=1
Both the database and Python library parameters will be set automatically if you load the module "alphafold/2.3.1"
To run Alphafold 2.3.1 via SLURM, please use the following template (for more information about options/flags, please refer to the README on Github).
In the script, input.fasta
is your input data, and you need to set up
output_dir. Since the command /usr/bin/hhsearch
inside the container
does not work on intel14 nodes (Illegal instruction
), please use the
SBATCH
option --constraint
in the job script.
#!/bin/bash
#SBATCH --job-name alphafold-run
#SBATCH --time=08:00:00
#SBATCH --gres=gpu:1
#SBATCH --cpus-per-task=8
#SBATCH --mem=20G
#SBATCH --constraint="[intel16|intel18|amd20]"
echo "Export AlphaFold variables"
export ALPHAFOLD_DATA_PATH="/mnt/research/common-data/alphafold/database_230"
export ALPHAFOLD_MODELS="/mnt/research/common-data/alphafold/database_230/params"
export PYTHONNOUSERSITE=1
export CUDA_VISIBLE_DEVICES=0
export NVIDIA_VISIBLE_DEVICES="${CUDA_VISIBLE_DEVICES}"
echo "Start Singularity run"
singularity run --nv \
-B $ALPHAFOLD_DATA_OLD_PATH \
-B $ALPHAFOLD_DATA_PATH:/data \
-B $ALPHAFOLD_MODELS \
-B .:/etc \
--pwd /app/alphafold /opt/software/alphafold/2.3.1/alphafold_2.3.1_latest.sif \
--data_dir=/data \
--output_dir=output_dir \
--fasta_paths=input.fasta \
--uniref90_database_path=/data/uniref90/uniref90.fasta \
--uniref30_database_path=/data/uniref30/UniRef30_2021_03 \
--mgnify_database_path=/data/mgnify/mgy_clusters.fa \
--bfd_database_path=/mnt/research/common-data/alphafold/database/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--template_mmcif_dir=/data/pdb_mmcif/mmcif_files \
--obsolete_pdbs_path=/data/pdb_mmcif/obsolete.dat \
--pdb_seqres_database_path=/data/pdb_seqres/pdb_seqres.txt \
--uniprot_database_path=/data/uniprot/uniprot.fasta \
--max_template_date=2020-05-14 \ # Update the template date if need be
--model_preset=monomer \
--use_gpu_relax=true