Reference datasets
Pawsey hosts a number of life science reference datasets centrally to save users from repeatedly downloading the same common datasets. These are hosted on /scratch/references/
. Additional references can be added if there is sufficient user interest. If there is something you would like to have added, please drop us a line at Below is the current list of datasets:
Database/Organism | Additional Information |
10x_singlecell_gene_expression | refdata-gex-GRCh38-2020-A refdata-gex-mm10-2020-A refdata-gex-GRCh38-and-mm10-2020-A |
10x_spatial_gene_expression | refdata-gex-GRCh38-2020-A refdata-gex-mm10-2020-A |
Alphafold | |
Arabidopsis thalian | TAIR10 |
Blast | Updated ~every monthly maintenance |
Diamond | A faster alternative to Blast |
Human | Includes: Broad hg19 bundle annotSV bundle |
Interproscan Version v5.56-89.0 | |
Kraken2 | pluspfp_20230605 nt_20230502 |
Metagenome Atlas v2.9 | |
Mouse | Includes: Broad mm10 bundle GRCm38 mm10 RNA_M25 |
Qiime | |
Sarek nf-core pipeline reference | iGenome files for GATK GRCh38. By pointing to these references files, users can avoid slowdowns from Sarek downloading and cache-checking if it performs the download itself. Use the `--igenomes_base` flag to point to the local reference files. |
Variant Effect Prediction (VEP) | Homo sapiens cache files for 109_GRCh37 and 109_GRCh38 |