Life Science and Bioinformatics

Reference datasets

Pawsey hosts a number of life science reference datasets centrally to save users from repeatedly downloading the same common datasets. These are hosted on /scratch/references/ . Additional references can be added if there is sufficient user interest. If there is something you would like to have added, please drop us a line at help@pawsey.org.au. Below is the current list of datasets:


Database/OrganismAdditional Information

10x_singlecell_gene_expression

refdata-gex-GRCh38-2020-A

refdata-gex-mm10-2020-A

refdata-gex-GRCh38-and-mm10-2020-A

10x_spatial_gene_expression

refdata-gex-GRCh38-2020-A

refdata-gex-mm10-2020-A

Alphafold


Arabidopsis thalian

TAIR10
BlastUpdated ~every monthly maintenance
DiamondA faster alternative to Blast
HumanIncludes:

Broad hg19 bundle
Broad hg38 bundle
GRCh38

annotSV bundle

Interproscan Version v5.56-89.0


Kraken2

pluspfp_20230605

nt_20230502 

Metagenome Atlas v2.9
Mouse

Includes:

Broad mm10 bundle

GRCm38

mm10

RNA_M25

Qiime

Sarek nf-core pipeline reference 

iGenome files for GATK GRCh38. By pointing to these references files, users can avoid slowdowns from Sarek downloading and cache-checking if it performs the download itself. Use the `--igenomes_base` flag to point to the local reference files. 
Variant Effect Prediction (VEP)Homo sapiens cache files for 109_GRCh37 and 109_GRCh38