Skip to content

ABySS

ABySS is a de novo, parallel, paired-end sequence assembler. It can run as an MPI job in the HPCC cluster. The latest version currently installed on the HPCC is 2.1.5, which can be loaded by

1
module load ABySS/2.1.5

You can optionally load other tools as needed, provided that they have been installed under the same toolchain environment as ABySS/2.1.5. For example,

1
module load BEDTools/2.27.1 SAMtools/1.9 BWA/0.7.17

is valid after you've loaded ABySS.

A sample SLURM script is below.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
#!/bin/bash

#SBATCH --job-name=abyss_test  
#SBATCH --nodes=4  
#SBATCH --ntasks-per-node=2  
#SBATCH --mem-per-cpu=5G  
#SBATCH --time=1:00:00  
#SBATCH --output=%x-%j.SLURMout

echo "$SLURM_JOB_NODELIST"

module load ABySS/2.1.5

export OMPI_MCA_mpi_warn_on_fork=0  
export OMPI_MCA_mpi_cuda_support=0

abyss-pe k=25 name=test in='/mnt/research/common-data/Bio/ABySS/test-data/reads1.fastq /mnt/research/common-data/Bio/ABySS/test-data/reads2.fastq' v=-v np=8 j=2

This script launches an MPI job by requesting 8 processes; they are distributed on 4 nodes (--nodes=4) with two processes each (--ntasks-per-node=2). Accordingly, in the abyss-pe command line, we specify np=8. Regarding parameter j, the manual states

The paired-end assembly stage is multithreaded, but must run on a single machine. The number of threads to use may be specified with the parameter j. The default value for j is the value of np.

So, rather than using np as the default value for j, we set j = 2 which is the number of CPUs per node as requested (in this case "task" is equivalent to CPU). To submit the job,

sbatch --constraint="[intel16|intel18]"

While the job is running, you may look at the SLURM output file, in this example, abyss_test-<job ID>.SLURMout, which has a lot of running log, including the following:

Running on 8 processors
6: Running on host lac-391
0: Running on host lac-194
2: Running on host lac-225
4: Running on host lac-287
7: Running on host lac-391
3: Running on host lac-225
1: Running on host lac-194
5: Running on host lac-287