Skip to content

orthomcl-pipeline

OrthoMCL Pipeline (https://github.com/apetkau/orthomcl-pipeline) is a wrapper that automates running of OrthoMCL. If you prefer to run OrthoMCL from scratch, please skip this tutorial.

Installation guide

You could install orthomcl pipeline to your home directory (or research space), following the instruction of installing OrthoMCL pipeline. All the Perl dependencies have been installed by iCER staff, and you only need to run a couple of commands to complete the installation. Importantly, it's assumed that you have already prepared your MySQL configuration file (see https://wiki.hpcc.msu.edu/x/aYe1).

Sample installation

I am going to install the pipeline in a subdirectory under my home ~/Software/.

Installing OrthoMCL Pipeline

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
ssh dev-intel18

# Load necessary modules
module purge
module load icc/2016.3.210-GCC-5.4.0-2.26 impi/5.1.3.181
module load OrthoMCL/2.0.9-custom-Perl-5.24.0
module load BLAST/2.2.26-Linux_x86_64
module load GCCcore/5.4.0 libxml2/2.9.4


# Download source and configure
cd Software
git clone https://github.com/apetkau/orthomcl-pipeline.git
cd orthomcl-pipeline
perl scripts/orthomcl-pipeline-setup.pl # set paths to dependencies
cat etc/orthomcl-pipeline.conf # parameters in this file can be adjusted; consult the instruction linked above
    # ---
    # blast:
    #   F: 'm S'
    #   b: '100000'
    #   e: '1e-5'
    #   v: '100000'
    # filter:
    #   max_percent_stop: '20'
    #   min_length: '10'
    # mcl:
    #   inflation: '1.5'
    # path:
    #   blastall: /opt/software/BLAST/2.2.26-Linux_x86_64/bin/blastall
    #   formatdb: /opt/software/BLAST/2.2.26-Linux_x86_64/bin/formatdb
    #   mcl: /opt/software/MCL/14.137-intel-2016b/bin/mcl
    #   orthomcl: /opt/software/OrthoMCL/orthomclsoftware-custom/bin
    # scheduler: fork
    # split: '4'


# Testing
export PATH=~/Software/orthomcl-pipeline/bin:~/Software/orthomcl-pipeline/scripts:$PATH
perl t/test_pipeline.pl -m ~/Practice/general_test/orthomcl/my_orthomcl_dir/orthomcl.config -s fork -t ~/tmp # replace the path to orthomcl.config with your own

Example: ortholog identification

The tutorial is adapted from a tutorial hosted at https://github.com/apetkau/microbial-informatics-2014/tree/master/labs/orthomcl. We strongly recommend that you read it fully before starting the hands-on practice below, which is a much simplified version of the original one and serves as a demo only. The datasets containing a set of V. Cholerae genomes are located in mnt/research/common-data/Bio/orthomcl-data/.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
ssh dev-intel18

# Then go to your orthomcl working directory

# Load necessary modules
module purge
module load icc/2016.3.210-GCC-5.4.0-2.26  impi/5.1.3.181
module load OrthoMCL/2.0.9-custom-Perl-5.24.0
module load BLAST/2.2.26-Linux_x86_64
module load GCCcore/5.4.0 libxml2/2.9.4
export PATH=~/Software/orthomcl-pipeline/bin:~/Software/orthomcl-pipeline/scripts:$PATH

# Run orthomcl pipeline (replace the path to orthomcl.config with your own)
orthomcl-pipeline -i /mnt/research/common-data/Bio/orthomcl-data -o orthomcl_out_tmp -m ~/Practice/general_test/orthomcl/my_orthomcl_dir/orthomcl.config --nocompliant

# Visualize the results by drawing a Venn Diagram using a pipeline utility script
nml_parse_orthomcl.pl -i orthomcl_out_tmp/groups/groups.txt -g /mnt/research/common-data/Bio/orthomcl-data/genome-groups.txt -s --draw -o orthomcl-stats.txt --genes

# View the Venn Diagram plot (just for demo; you should transfer the svg to your local computer for better display effect)
java -jar /opt/software/batik/batik-1.9/batik-squiggle-1.9.jar genome-groups.txt.svg

Note

As mentioned in the full tutorial, you need to answer "yes" to the database removal question in the course of the run.