Warning

This is as a Lab Notebook which describes how to solve a specific problem at a specific time. Please keep this in mind as you read and use the content. Please pay close attention to the date, version information and other details.

Downloading and Installing cryoSPARC (2023-11-22)

Prepare for Installation

Register and obtain the license. To obtain a License ID for cryoSPARC, go to https://cryosparc.com/download, fill out the form and submit it. Then on approval, you will receive an email with a license ID number. (Store license ID in some safe place in your home space)
Log into a development node with GPU on HPCC. (NOTE: Use only dev-amd20-v100 due to the GPU driver version requirement.)
Determine where you'd like to install cryoSPARC and create the installation directory. User should install this software in $HOME or $RESEARCH space. We use CryoSPARC in home directory as the installation directory in this document as an example.
```
mkdir ~/CryoSPARC          # create the installation directory      
cd ~/CryoSPARC             # go to the install directory
```

Note

`The installation directory could be any directory under the user's home or research space where use has full access permission. This will be the root directory where all cryoSPARC code and dependencies will be installed.

Download software

Set environment variable: run
```
export LICENSE_ID="<license_id>"
```
where is the license ID you received from the registration.

Download package to the install directory

cd ~/CryoSPARC
curl -L https://get.cryosparc.com/download/master-latest/$LICENSE_ID -o cryosparc_master.tar.gz 
curl -L https://get.cryosparc.com/download/worker-latest/$LICENSE_ID -o cryosparc_worker.tar.gz

Extract the downloaded files:

tar -xf cryosparc_master.tar.gz cryosparc_master             
tar -xf cryosparc_worker.tar.gz cryosparc_worker

Note

After extracting the worker package, you may see a second folder called cryosparc2_worker (note the 2) containing a single version file. This is here for backward compatibility when upgrading from older versions of cryoSPARC and is not applicable for new installations. You may safely delete the cryosparc2_worker

Installation of Master

Load environment

module purge                           # unload previous loaded modules.
module load foss/2022b                 # load the latest compile and dependency.

Master node Installationi

cd <dir_master_package>        # go to master package directory.

./install.sh --license $LICENSE_ID \
             --hostname <master_hostname> \
             --dbpath <db_path> \
             --port <port_number> \
             [--insecure] \
             [--allowroot] \
             [--yes] \

Example:

cd ~/CryoSPARC/cryosparc_master           # go to master package directory.
./install.sh --license $LICENSE_ID \
             --hostname localhost \
             --dbpath ~/CryoSPARC/cryoSPARC_database \
             --port 45000 \

Start cryoSPARC: run

export CRYOSPARC_FORCE_HOSTNAME=true
export CRYOSPARC_MASTER_HOSTNAME=$HOSTNAME
./bin/cryosparcm start

Create your user account of cryoSPARC: run

./bin/cryosparcm createuser --email "<user email>" \
                      --password "<user password>" \
                      --username "<login username>" \
                      --firstname "<given name>" \
                      --lastname "<surname>"

Note

For details of the meaning of the options of above master node installalation steps, See https://guide.cryosparc.com/setup-configuration-and-management/how-to-download-install-and-configure/downloading-and-installing-cryosparc#glossary-reference-1.

After completing the above, you are ready to access the user interface.

Access the user interface: navigate your browser to http://<master_hostname>:<port_number>

If you are physically using the same machine as the master node to interact with the cryoSPARC interface, you can connect to it as: http://localhost:<port_number> See https://guide.cryosparc.com/setup-configuration-and-management/how-to-download-install-and-configure/accessing-cryosparc for more information.

Installation of worker:

Load environment if not yet done it

module purge                           # unload previous loaded modules.
module load foss/2022b                 # load the latest compile and dependency.
module load CUDA/12.3.0                # cuda is needed for worker

Worker node Installation

cd <install_path> /cryosparc_worker

./install.sh --license $LICENSE_ID \
             [--yes]

Example: using CUDA/12.3.0 module.

./install.sh --license $LICENSE_ID

Note

For the meaning of the options of worker node installation script, See https://guide.cryosparc.com/setup-configuration-and-management/how-to-download-install-and-configure/downloading-and-installing-cryosparc#worker-installation-glossary-reference.

Note

Once the master and worker are successfully installed at the dev-node, stop the current cryosparc session using command " ~/CryoSPARC/cryosparc_master/bin/cryosparcm stop"

Start an interactive session of CryoSPARC using ondemand

Request an interactive desktop with the number of GPUS equal to the number of workers. You need to request with the sufficient resources (CPU, memory, wall time, etc.). Open a terminal on the desktop.

Launch cryosparc master: For the users convenience, create a file named "cryosparc.sh" containing commands for setting up environment (shown below). Before launch CryoSPARC, run command "source cryosparc.sh" first.

#!/bin/bash
# set up CryoSPARC environment
#

# Load modules
module purge
module load foss/2022b

# set PATH
export PATH=~/cryosparc/cryosparc_master/bin:~/cryosparc/cryosparc_worker/bin:$PATH
export CRYOSPARC_FORCE_HOSTNAME=true
export CRYOSPARC_MASTER_HOSTNAME=$HOSTNAME

Connect a Cluster to CryoSPARC

Once the cryosparc_worker package is installed, the cluster must be registered with the master process. This requires a template for job submission commands and scripts that the master process will use to submit jobs to the cluster scheduler. To register the cluster, provide cryoSPARC with the following two files and call the cryosparcm cluster connect command: - cluster_info.json - cluster_script.sh

The first file (cluster_info.json) contains template strings used to construct cluster commands (e.g., qsub, qstat, qdel etc., or their equivalents for your system). The second file (cluster_script.sh) contains a template string to construct appropriate cluster submission scripts for your system. The jinja2 template engine is used to generate cluster submission/monitoring commands as well as submission scripts for each job.

Create the files The following fields are required to be defined as template strings in the configuration of a cluster. Examples for SLURM are given; use any command required for your particular cluster scheduler. Note that parameters listed as "optional" can be omitted or included with their value as null.

cluster_info.json:

name               :  "cluster1"
# string, required
# Unique name for the cluster to be connected (multiple clusters can be 
# connected)

worker_bin_path    :   "/path/to/cryosparc_worker/bin/cryosparcw"
# string, required
# Path on cluster nodes to the cryosparcw script

cache_path         :   "/path/to/local/SSD/on/cluster/node"
# string, optional
# Path on cluster nodes that is a writable location on local SSD on each 
# cluster node. This might be /scratch or similar. This path MUST be the 
# same on all cluster nodes. Note that the installer does not check that 
# this path exists, so make sure it does and is writable. If you plan to 
# use the cluster nodes without SSD, you can omit this field.

cache_reserve_mb   :   10000
# integer, optional
# The size (in MB) to initially reserve for the cache on the SSD. This 
# value is 10GB by default, which means cryoSPARC will always leave at
# least 10GB of space on the SSD free.

cache_quota_mb     :   1000000
# integer, optional
# The maximum size (in MB) to use for the cache on the SSD.

send_cmd_tpl       :   "{{ command }}"
# string, required
# Used to send a command to be executed by a cluster node (in case the 
# cryosparc master is not able to directly use cluster commands). If your 
# cryosparc master node is able to directly use cluster commands 
# (like qsub etc) then this string can be just "{{ command }}"

qsub_cmd_tpl       :   "sbatch {{ script_path_abs }}"
# string, required
# The command used to submit a job to the cluster, where the job 
# is defined in the cluster script located at {{ script_path_abs }}. This 
# string can also use any of the variables defined in cluster_script.sh 
# that are available when the job is scheduled (e.g., num_gpus, num_cpus, etc.,)

qstat_cmd_tpl      :   "squeue -j {{ cluster_job_id }}"
# string, required
# Cluster command that will report back the status of cluster job with its id 
# {{ cluster_job_id }}.

qdel_cmd_tpl       :   "scancel {{ cluster_job_id }}"
# string, required
# Cluter command that will kill and remove {{ cluster_job_id }} from the 
# queue.

qinfo_cmd_tpl      :   "sinfo --format='%.8N %.6D %.10P %.6T %.14C %.5c %.6z %.7m %.7G %.9d %20E'"
# string, required
# General cluster information command

transfer_cmd_tpl   :   "scp {{ src_path }} loginnode:{{ dest_path }}"
# string, optional
# Command that can be used to transfer a file {{ src_path }} on the cryosparc 
# master node to {{ dest_path }} on the cluster nodes. This is used when the 
# master node is remotely updating a cluster worker installation. This is 
# optional; if it is incorrect or omitted, you can manually update the 
# cluster worker installation.

Along with the above commands, a complete cluster configuration requires a template cluster submission script. The script must send jobs into your cluster scheduler queue and mark them with the appropriate hardware requirements. The cryoSPARC internal scheduler submits jobs with this script as their inputs become ready. The following variables are available for use used within a cluster submission script template. When starting out, example templates may be generated with the commands explained below.

{{ script_path_abs }}    # absolute path to the generated submission script
{{ run_cmd }}            # complete command-line string to run the job
{{ num_cpu }}            # number of CPUs needed
{{ num_gpu }}            # number of GPUs needed.
{{ ram_gb }}             # amount of RAM needed in GB
{{ job_dir_abs }}        # absolute path to the job directory
{{ project_dir_abs }}    # absolute path to the project dir
{{ job_log_path_abs }}   # absolute path to the log file for the job
{{ worker_bin_path }}    # absolute path to the cryosparc worker command
{{ run_args }}           # arguments to be passed to cryosparcw run
{{ project_uid }}        # uid of the project
{{ job_uid }}            # uid of the job
{{ job_creator }}        # name of the user that created the job (may contain spaces)
{{ cryosparc_username }} # cryosparc username of the user that created the job (usually an email)

Note

The cryoSPARC scheduler does not assume control over GPU allocation when spawning jobs on a cluster. The number of GPUs required is provided as a template variable. Either your submission script or your cluster scheduler is responsible for assigning GPU device indices to each job spawned based on the provided variable. The cryoSPARC worker processes that use one or more GPUs on a cluster simply use device 0, then 1, then 2, etc. Therefore, the simplest way to correctly allocate GPUs is to set the CUDA_VISIBLE_DEVICES environment variable in your cluster scheduler or submission script. Then device 0 is always the first GPU that a running job must use.

Load script and register the integration.

To create or set a configuration for a cluster in cryoSPARC, use the following commands.

cryosparcm cluster example <cluster_type>
# dumps out config and script template files to current working directory
# examples are available for pbs and slurm schedulers but others should 
# be very similar

cryosparcm cluster dump <name>
# dumps out existing config and script to current working directory

cryosparcm cluster connect
# connects new or updates existing cluster configuration, 
# reading cluster_info.json and cluster_script.sh from the current directory, 
# using the name from cluster_info.json

cryosparcm cluster remove <name>
# removes a cluster configuration from the scheduler

Note

The command cryosparcm cluster connect attempts reading cluster_info.json and cluster_script.sh from the current working directory.

Examples of cluster_info.json and cluster_script.sh scripts for SLURM on HPCC:

cluster_info.json

{
"qdel_cmd_tpl": "scancel {{ cluster_job_id }}",
"worker_bin_path": "/mnt/home/wangx147/cryoSPARC/cryosparc_worker/bin/cryosparcw",
"title": "test_cluster",
"cache_path": "/tmp",
"qinfo_cmd_tpl": "sinfo --format='%.8N %.6D %.10P %.6T %.14C %.5c %.6z %.7m %.7G %.9d %20E'",
"qsub_cmd_tpl": "sbatch {{ script_path_abs }}",
"qstat_cmd_tpl": "squeue -j {{ cluster_job_id }}",
"cache_quota_mb": null,
"send_cmd_tpl": "{{ command }}",
"cache_reserve_mb": 10000,
"name": "test_cluster"
}

cluster_script.sh

#!/bin/bash
#SBATCH --job-name=cryosparc_{{ project_uid }}_{{ job_uid }}
#SBATCH --partition=general
#SBATCH --output={{ job_log_path_abs }}
#SBATCH --error={{ job_log_path_abs }}
#SBATCH --nodes=1
#SBATCH --mem={{ (ram_gb*1000)|int }}M
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task={{ num_cpu }}
#SBATCH --gres=gpu:{{ num_gpu }}
#SBATCH --gres-flags=enforce-binding

srun {{ run_cmd }}

For more examples, see https://guide.cryosparc.com/setup-configuration-and-management/how-to-download-install-and-configure/cryosparc-cluster-integration-script-examples.

Q: Where should these two files be stored?

A: Working directory.