Blast+

Blast+ is a program for comparing biological sequence information, such as amino-acid or DNA/RNA sequences.

Due to the varied nature of Blast+ usage and its lack of extensive parallelism we provide only general advice on running Blast searches. We focus on two example cases.

On this page:

Running Blast+ 

Blast+ is provided by system modules. To load the Blast+ module, use the following command:

$ module load blast/2.12.0--pl5262h3289130_0

NCBI Indexed Data bases

Standard NCBI indexed databases, as used by the Blast+ executables, are currently centrally installed in the directory /scratch/references/blastdb_update/blast-YYYY-MM-DD/db. Where 'blastdb-YYYY-MM-DD' is the date of the download of the database files. The Blast+ databases are downloaded every scheduled maintenance period. To check the date of a given nucleotide or protein database, use the Blast+ utility blastdbcmd, which gives the date the file was updated in the central NCBI repository (via the anonymous FTP download site at ftp://ftp.ncbi.nlm.nih.gov/blast/db/).

Terminal 1. Using Blast to check database
$ blastdbcmd -info -db /scratch/references/blastdb_update/blast-2021-09-01/db/nt
Database: Nucleotide collection (nt)
	72,899,005 sequences; 510,954,263,840 total bases

Date: Aug 24, 2021  2:13 AM	Longest sequence: 99,791,824 bases

BLASTDB Version: 5

Volumes:  
		/scratch/references/blastdb_update/blast-2021-09-01/db/nt.00
...

How to use Blast+ effectively (on HPC systems)

Running typical Blast+ queries against an indexed data base can be time consuming, and can also produce large result files. The following tips can help to improve the efficiency of Blast+ queries:

  • Ensure queries do not contain exact replicas
  • Limit the number of search results. The default value for both -num_descriptions  and -num_alignments  (and the alternative -max_target_seqs ) is 500. These values should be reduced as much as possible to reduce both the time taken to perform the search and the resulting output file size.
  • Using a job array can really speed up your analysis by running many blast queries at once, rather than one at a time (i.e. changing from running in serial to running "embarrassingly parallel"). For more details, see the section about job arrays on the Example Workflows page.

Output format

By default, Blast+ searches write results in a simple text format. Blast+ also offers some structured output data formats: XML, JSON and ASN1. These formats can offer advantages in some circumstances.

You can use the Blast+ option -outfmt 11 to produce ASN1 format. This option saves search results in a form that can be used to re-format the results using blast_formatter without the necessity of re-running the query (provided the relevant database is available). Depending on the exact type of search, the .asn1 file may be significantly larger (perhaps by a factor of 2-4) than the corresponding default text format. However, compressed .asn1  (produced using gzip for example) is usually smaller, making it a reasonable archival choice for large quantities of search data are to be archived.

External links