Trinity for RNA-seq de novo assembly
Loading module
Take loading Trinity 2.6.6 as an example, we run:
1 2 |
|
Most basic run (transcript assembly)
A typical Trinity command for assembling strand-specific paired-end RNA-seq data would look like:
A typical run of Trinity
1 2 3 4 5 6 7 |
|
This will generate output files in a new directory trinity_out_dir
in the working directory. Among them, the assembled transcripts file is
"Trinity.fasta
". For more detail, check
out https://github.com/trinityrnaseq/trinityrnaseq/wiki.
When you submit the above command as a job to the cluster, you need to request 10 CPUs in the sbatch script with the following lines (in addition to your other sbatch directives):
sbatch code snippet
1 2 3 |
|
Transcript quantification
Trinity provides abundant utility scripts for post-assembly analysis,
such as quality assessment, transcript quantification and differential
expression tests. For some of them, external software tools need to be
installed separately (that is, they are not bundled with Trinity). For
example, for the transcript quantification step, we will need one of
RSEM, eXpress, kalllisto and salmon (cf. https://github.com/trinityrnaseq/trinityrnaseq/wiki/Trinity-Transcript-Quantification).
We have made all these four available on the HPCC. As instructed by
Trinity, "the tools should be available via your PATH setting". So, in
the next example where we choose to use RSEM to align reads to the
assembled transcript and then quantify transcript abundance, we first
set the PATH
variable so that RSEM can be automatically searched for
by trinity.
Using RSEM for transcript quantification
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
The RSEM computation generates two primary output files containing
estimated abundances in the subdirectory RSEM_out
as specified in the
command above: RSEM.isoforms.results
(transcript level)
and RSEM.genes.results
(gene level).
More utilities
Please consult https://github.com/trinityrnaseq/trinityrnaseq/wiki for detail.
Note that a few R packages are needed for differential expression
analysis (https://github.com/trinityrnaseq/trinityrnaseq/wiki/Trinity-Differential-Expression).
These have been installed in R/4.0.2
which can be loaded
by
1 |
|
Version note
The latest version is 2.91. After loading it, you may load R 4.0.2 for DE analysis.
module purge
module load GCC/8.3.0 OpenMPI/3.1.4 R/4.0.2 Trinity/2.9.1
module load R/4.0.2