Databases in common-data
Overview
The HPCC hosts a number of large, widely used genetics databases in the common-data
research space (/mnt/research/common-data
). These databases
are publicly readable on the HPCC, but writing to these folders is limited to ICER staff. While we try to keep these databases up to date as
much as possible, if you find something is missing or encounters problems, please open a ticket with us.
NCBI BLAST
NCBI maintains a number of nucleotide and protein sequence databases for use with their BLAST/BLAST+ tools. Single or small sets of sequences can be compared against these databases using the NCBI BLAST webtool. However, for larger, more customizable comparisons, the HPCC maintains a copy of these databases at:
/mnt/research/common-data/Bio/blast_databases/blastdb_current
We also maintain a number of BLAST tools as part of our software module system:
module avail BLAST
For details of the individual databases, please refer to the NCBI documentation.
Alphafold
The protein prediction software AlphaFold requires a set of protein sequence databases to run. Although all versions of the software require similar data, due to small differences in folder structure, we host three different versions of the datbases for AlphaFold 3, 2.3, and older verions of AlphaFold 2.x.
/mnt/research/common-data/alphafold/database_3 # AlphaFold 3
/mnt/research/common-data/alphafold/database_230 # AlphaFold 2.3
/mnt/research/common-data/alphafold/database # AlphaFold 2 Legacy
If you are using one of the AlphaFold modules, the path to the correct database should be set automatically. For more details on running AlphaFold on HPCC, see our docuemntation on AlphaFold 2.3.2 and AlphaFold3 (Coming Soon).
4D Nucleosome
The 4D Nucleosome dataset contains the chromatin contact frequence maps for a large panel of different cell types and tissues from different species. The datasets are overall generated by genome-wide Hi-C experiments, followed by standard batch-effect corrections and normalizations. The chromatin contact frequency maps characterize the information of chromatin interactions, which are useful to analyze 3D genome folding, multi-scale chromatin organizations, gene regulation, epigenomics, evolution and other functional genomics research.
Data on the HPCC covers .hic
files from dilution, DNaseHiC, insitu, MicroC and TCC, which are all stored at:
/mnt/research/common-data/4D_Nucleosome/database
To search the database and find additional meta-data on 4D Nucleosome, please refer to the project data portal.