OrthoMCL Pipeline
(https://github.com/apetkau/orthomcl-pipeline) is a
wrapper that automates running of OrthoMCL. If you prefer to run
OrthoMCL from scratch, please skip this tutorial.
Installation guide
You could install orthomcl pipeline to your home directory (or research
space), following the instruction of installing OrthoMCL pipeline. All
the Perl dependencies have been installed by iCER staff, and you only
need to run a couple of commands to complete the installation.
Importantly, it's assumed that you have already prepared your MySQL
configuration file.
Sample installation
I am going to install the pipeline in a subdirectory under my home
~/Software/.
sshdev-intel18
# Load necessary modules
modulepurge
moduleloadicc/2016.3.210-GCC-5.4.0-2.26impi/5.1.3.181
moduleloadOrthoMCL/2.0.9-custom-Perl-5.24.0
moduleloadBLAST/2.2.26-Linux_x86_64
moduleloadGCCcore/5.4.0libxml2/2.9.4
# Download source and configurecdSoftware
gitclonehttps://github.com/apetkau/orthomcl-pipeline.git
cdorthomcl-pipeline
perlscripts/orthomcl-pipeline-setup.pl# set paths to dependencies
catetc/orthomcl-pipeline.conf# parameters in this file can be adjusted; consult the instruction linked above# ---# blast:# F: 'm S'# b: '100000'# e: '1e-5'# v: '100000'# filter:# max_percent_stop: '20'# min_length: '10'# mcl:# inflation: '1.5'# path:# blastall: /opt/software/BLAST/2.2.26-Linux_x86_64/bin/blastall# formatdb: /opt/software/BLAST/2.2.26-Linux_x86_64/bin/formatdb# mcl: /opt/software/MCL/14.137-intel-2016b/bin/mcl# orthomcl: /opt/software/OrthoMCL/orthomclsoftware-custom/bin# scheduler: fork# split: '4'# TestingexportPATH=~/Software/orthomcl-pipeline/bin:~/Software/orthomcl-pipeline/scripts:$PATH
perlt/test_pipeline.pl-m~/Practice/general_test/orthomcl/my_orthomcl_dir/orthomcl.config-sfork-t~/tmp# replace the path to orthomcl.config with your own
Example: ortholog identification
The tutorial is adapted from a tutorial hosted at https://github.com/apetkau/microbial-informatics-2014/tree/master/labs/orthomcl.
We strongly recommend that you read it fully before starting the
hands-on practice below, which is a much simplified version of the
original one and serves as a demo only. The datasets containing a set of
V. Cholerae genomes are located in mnt/research/common-data/Bio/orthomcl-data/.
1 2 3 4 5 6 7 8 91011121314151617181920
sshdev-intel18
# Then go to your orthomcl working directory# Load necessary modules
modulepurge
moduleloadicc/2016.3.210-GCC-5.4.0-2.26impi/5.1.3.181
moduleloadOrthoMCL/2.0.9-custom-Perl-5.24.0
moduleloadBLAST/2.2.26-Linux_x86_64
moduleloadGCCcore/5.4.0libxml2/2.9.4
exportPATH=~/Software/orthomcl-pipeline/bin:~/Software/orthomcl-pipeline/scripts:$PATH# Run orthomcl pipeline (replace the path to orthomcl.config with your own)
orthomcl-pipeline-i/mnt/research/common-data/Bio/orthomcl-data-oorthomcl_out_tmp-m~/Practice/general_test/orthomcl/my_orthomcl_dir/orthomcl.config--nocompliant
# Visualize the results by drawing a Venn Diagram using a pipeline utility script
nml_parse_orthomcl.pl-iorthomcl_out_tmp/groups/groups.txt-g/mnt/research/common-data/Bio/orthomcl-data/genome-groups.txt-s--draw-oorthomcl-stats.txt--genes
# View the Venn Diagram plot (just for demo; you should transfer the svg to your local computer for better display effect)
java-jar/opt/software/batik/batik-1.9/batik-squiggle-1.9.jargenome-groups.txt.svg
Note
As mentioned in the full tutorial, you need to answer
"yes" to the database removal question in the course of the run.