Skip to content

GATK4

Be sure to read this Quick Start before using GATK4. In particular, note the following statement from the developers:

Once you have downloaded and unzipped the package (named gatk-[version]), you will find four files inside the resulting directory:

gatk
gatk-package-[version]-local.jar
gatk-package-[version]-spark.jar
README.md

Now you may ask, why are there two jars? As the names suggest, gatk-package-[version]-spark.jar is the jar for running Spark tools on a Spark cluster, while gatk-package-[version]-local.jar is the jar that is used for everything else (including running Spark tools "locally", i.e. on a regular server or cluster).

So does that mean you have to specify which one you want to run each time? Nope! See the gatk file in there? That's an executable wrapper script that you invoke and that will choose the appropriate jar for you based on the rest of your command line. You could still invoke a specific jar if you wanted, but using gatk is easier, and it will also take care of setting some parameters that you would otherwise have to specify manually.

On the HPCC, after login to a dev-node, run: module load GATK/4.0.5.1-Python-3.6.4. As a tip, if you happen to run a module purge command in the middle of your work, and want to go back to the original login environment, please type the command: exec bash -l

A simple test on the HPCC is provided below.

1
2
module load GATK/4.0.5.1-Python-3.6.4
gatk --java-options "-Xmx8G" HaplotypeCaller -R /opt/software/GATK/3.3-0-Java-1.7.0_80/resources/exampleFASTA.fasta -I /opt/software/GATK/3.3-0-Java-1.7.0_80/resources/exampleBAM.bam -O gatk_test.vcf