/
Reference Datasets

Reference Datasets

Pawsey hosts a number of life science reference datasets centrally to save users from repeatedly downloading the same common datasets. These are hosted on /scratch/references/ . Additional references can be added if there is sufficient user interest. If there is something you would like to have added, please drop us a line at help@pawsey.org.au. Below is the current list of datasets:

 

Directory

Description

Directory

Description

10x_GRCh38_July2024

Reference datasets for 10X sequencing (Human GRCh38)

10x_singlecell_gene_expression (2020)

Reference datasets for 10X single cell gene expression for human GRCh38, mouse mm10, and combine human/mouse

10x_spatial_gene_expression

Reference datasets for 10X spatial gene expression for human GRCh38, mouse mm10, and combine human/mouse

alphafold

Databases for CPU only AlphaFold2 module

alphafold_feb2024

Updated databases and weights for GPU enabled AlphaFold2 container

ameta

Database to support https://genomebiology.biomedcentral.com/articles/10.1186/s13059-023-03083-9

arabidopsis_thaliana

Reference genome files for arabidopsis TAIR10

blastdb_update

Updated ~every monthly maintenance

busco_db

Reference to support busco, specifically actinopterygii_odb10 and vertebrata_odb10

colabfold_jun2024

Updated databases and weights for GPU enabled ColabFold container

diamond

A faster alternative to Blast

Foreign_Contamination_Screening

 

human

Human reference genomes and associated files for annotSV, broad_hg19, broad_hg38, and GRCh38

interproscan-5.56-89.0

References for interproscan

kaiju

Kaiju pre-built indexes for protein sequences from RVDB-prot v26.0.

kraken2

Contains nt_20230502 and pluspfp_20230605

metagenome_atlas_2.9

Datasets to support this tool, including adapters.fa, checkm, Dram, EggNOG_V5, GTDB_V06, GTDB_V07, phiX174_virus.fa

mouse

Mouse reference datasets including broad_mm10, GRCm38, mm10, RNA_M25

qiime

References to support qiime

sarek

References to support NF-core Sarek pipeline, including:

  • Ensembl

    • GRCh37

    • GRCh38

  • GATK

    • GRCh38

  • NCBI

    • GRCh38

  • snpeff_cache

    • GRCh38.105

slorado

test dataset for slorado

veba_database

Veba databases

veba_db_v8

Veba databases with newer version

vep

VEP Human databases for 109_GRCh37, 109_GRCh38, 111_GRCh38

VirDB_20230913

Support for https://github.com/eresearchqut/ontvisc