Mothur

Loading Mothur

Mothur version 1.40.3 on the HPCC has two running modes, with and without MPI functionality. You can load either one by:

module purge
module load icc/2017.1.132-GCC-6.3.0-2.27 impi/2017.1.132 Mothur/1.40.3-nonMPI-Python-2.7.13        # Mothur non-MPI version
module load icc/2017.1.132-GCC-6.3.0-2.27 impi/2017.1.132 Mothur/1.40.3-Python-2.7.13               # Mothur MPI version

As of Feb 7, 2019, the highest version is Mothur/1.41.3 (in MPI mode only).
As reported by some Mothur users, when using Mothur and vsearch together, the only compatible version of vsearch is 1.8. So after loading Mothur, you would add a line of "module load vsearch/1.8.0".

Running Mothur

Take a look at this example code batch.m:

/mnt/research/common-data/Examples/mothur/batch.m:

set.current(fasta=ex.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.subsample.fasta, count=ex.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.subsample.count_table,  processors=1)
dist.seqs(fasta=current, cutoff=0.2, processors=8)

where we specified processors=8 in line 2. To be able to actually utilize 8 processors, you need to launch Mothur using either of the following commands, depending on whether MPI is enabled or not.

MPI: mpirun -np 8 mothur batch.m
non-MPI: mothur batch.m

Differences between MPI and non-MPI runs

MPI jobs can run across multiple nodes at the cost of overhead. This can lead to increased memory usage and decreased performance. The additional processor advantages offered by MPI may be cancelled out by I/O waits to disk. If you request many more processes than can be provided by a single node, use MPI mode. If you choose the MPI type, specify number of processes in the SLURM script by --ntask=8 for the example above. SLURM will determine how many nodes and tasks per node are needed. Also, memory request in this case should be made on a per CPU basis (by defining --mem-per-cpu).

Non-MPI jobs run on a single node with multiple threads/processes. For above Mothur command, you should set up something like

#SBATCH --nodes=1

#SBATCH --ntasks-per-node=1

#SBATCH --cpus-per-task=8

in your job submission script. If most of the nodes in the cluster are highly occupied, the job scheduler may have a hard time finding the nodes with availability of your desired number of threads.