...

Column

width	900px

Code Block

language	bash
theme	Emacs
title	Listing 2. Example of a job array with many files

#!/bin/bash --login
#This example use general SBATCH settings, but please refer to the specific guide
#of intended the cluster for possible needed changes

#
# SLURM directives
#
# This is an array job with 35 subtasks, (--array=0-34).
#
# The output for each subtask will be sent to a separate file
# identified by the jobid (--output=array-%j.out)
# 
# Each subtask will occupy one node (--nodes=1) with
# a wall-clock time limit of twenty minutes (--time=00:20:00)
#
# Replace [your-project] with the appropriate project name
# following --account (e.g., --account=project123)


#SBATCH --account=[your-project]
#SBATCH --output=array-%j.out
#SBATCH --array=0-34       #this should match the number of input files
#SBATCH --nodes=1
#SBATCH --ntasks=1          #One main task per file to be processed in each array-subtask
#SBATCH --cpus-per-task=1 	#this will vary depending on the requirements of the task
#SBATCH --mem=1840M         #Needed memory per array-subtask (or use --exclusive for exclusive access)
#SBATCH --time=00:20:00

#---  
echo "All jobs in this array have:"
echo "- SLURM_ARRAY_JOB_ID=${SLURM_ARRAY_JOB_ID}"
echo "- SLURM_ARRAY_TASK_COUNT=${SLURM_ARRAY_TASK_COUNT}"
echo "- SLURM_ARRAY_TASK_MIN=${SLURM_ARRAY_TASK_MIN}"
echo "- SLURM_ARRAY_TASK_MAX=${SLURM_ARRAY_TASK_MAX}"
 
echo "This job in the array has:"
echo "- SLURM_JOB_ID=${SLURM_JOB_ID}"
echo "- SLURM_ARRAY_TASK_ID=${SLURM_ARRAY_TASK_ID}"

#--- 
#Specific settings for the cluster you are on
#(Check the specific guide of the cluster for additional settings)

#---  
# grab our filename from a directory listing
FILES=($(ls -1 *.input.txt)) #this pulls in all the files ending with input.txt
FILENAME=${FILES[$SLURM_ARRAY_TASK_ID]} #this allows the slurm to enter the input.txt files into the job array
echo "My input file is ${FILENAME}" #this will print the file name into the log file 

#---  
#example job using the above variables
srun -N 1 -n 1 -c 1 ExpansionHunterDenovo-v0.8.7-linux_x86_64/scripts/casecontrol.py locus \
        --manifest ${FILENAME} \
        --output ${FILENAME}.CC_locus.tsv

...

Column

width	900px

Code Block

language	bash
theme	DJango
title	Terminal 4. View jobs in queue for user

$ squeue -u espinosamatilda
JOBID    USER     ACCOUNT                   NAME EXEC_HOST ST     REASON   START_TIME     END_TIME  TIME_LEFT NODES   PRIORITY
3483798  espn   matilda  pawsey            iterativeJob  nid00017  R       None     14:44:27     14:47:27       2:50     1       5269
3483799  espn   matilda  pawsey            iterativeJob       n/a PD Dependency          N/A          N/A       5:00     1       5269

...

Note that X11 forwarding is enabled by default in the interactive queue.

We Nevertheless, we recommend users to use FastX, a Pawsey's web-based remote-visualisation service on Topaz, to launch compute-intensive visualisation packages such as ParaView, VisIt or VMD. Refer Please refer to theSetonix Remote Visualisation - Topaz support page for more information documentation for detailed instructions of access and use.

Packing serial/small multithreaded jobs

...

When to use job arrays: For nodes that can be shared (like gpuq pertition in Topaz) , the best practice is to use job arrays. A disadvantage of job packing on shared nodes is that unbalanced steps might lead to resources being held unnecessarily. When using arrays this problem does not exist because, as soon as any job finishes or fails, the resources for that job are freed for use by another user.

...

Column

width	900px

Code Block

language	bash
theme	Emacs
title	Listing 21. GPU job array example

#!/bin/bash --login

#SBATCH --account=[your-account]-gpu
#SBATCH --array=0-7
#SBATCH --partition=gpuqgpu
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-taskgpu=1
#SBATCH --mem=80G
#SBATCH --gres=gpu:1
#SBATCH --time=00:10:00



#Default loaded compiler module is gcc module

module load cuda

#Go to the right directory for this instance of the job array using SLURM_ARRAY_TASK_ID as the identifier:
#We are assuming all the input files needed for each specific job reside in the corresponding working directory
cd workingDir_${SLURM_ARRAY_TASK_ID}

#Run the cudahip executable (asumingassuming the same executable will be used by each job, and that it resides in the submission directory):
srun -u -N 1 -n 1 -c 1 ${SLURM_SUBMIT_DIR}/main_cudahip

When to use job packing: For nodes where resources are exclusive and cannot be shared among different users/jobs at the same time (like nvlinkq partition in Topaz) the best practise is to to use job packing When your workflow requieres the execution of several jobs, but each individual job does not requiere the whole resources of a node. In that case, you might consider to use job packing and execute your several jobs in the same node at the same time. Job packing allows to use all (or most) of the resources of the node. Ideally, multiple jobs should be packed in order to make use of the four all available GPUs in the node and jobs should have a similar estimated execution time to avoid load balancing issues. (Obviously if a single job can make use of all the four GPUs, that is also desirable and that would not need packing.) We do not recommend packing jobs across across multiple nodes with the same job script due to possible load balancing issues: all resources will be held and unavailable to other users/jobs until the last substep (job) in the packing finishes.

...

Column

width	900px

Code Block

language	bash
theme	Emacs
title	Listing 22. GPU job packing example using multiple steps simultaneously

#!/bin/bash --login


#SBATCH --account=[your-account]-gpu
#SBATCH --partition=nvlinkqgpu
#SBATCH --nodes=1
#SBATCH --ntasks=4
#SBATCH --ntasks-per-socket=22   #maximum 2 tasks per socket (each socket has 2 GPUs in this partition)
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:4
#SBATCH --time=00:10:00
 
#Default loaded compiler module is gcc module
 
module load cuda

for tagID in $(seq 0 3); do
   #Go to the right directory for this step of the job pack using tagID as the identifier:
   #We are assuming all the input files needed for each specific job reside in the corresponding working directory
   cd ${SLURM_SUBMIT_DIR}/workingDir_${tagID}

   #Defining an output file for this step
   outputFile=results_${tagID}.out
   echo "Starting" > $outputFile

   #Run the cuda executable (asuming the same executable will be used by each step, and that it resides in the submission directory):
   srun -u -N 1 -n 1 --mem=056G --gres=gpu:1 --exact ${SLURM_SUBMIT_DIR}/main_cuda >> $outputFile &
done
wait

Note

icon	false
title	Notes

In the header a total of four GPUS is requested. For each job step the specific number of GPUs to be used (1 in this case) is indicated. The use of --mem=0 is to avoid memory restrictions56G indicates the amount of memory to be allocated for each step, and the --exact option avoids possible sharing of allows access to only the resources requested for that specific the step.
Note the logic of the use of " & .. & ..wait" for being able to execute each step in the background and wait for them to finish before ending the job script.
In the loop, the iterator (numeric identifier) for each step is defined to start at 0 in order to be equivalent to the natural numbering of Slurm, but you can use any start and end value to be consistent with your own naming of directories, input files and output files.

...

Column

width	900px

Code Block

language	bash
theme	Emacs
title	Listing 23. GPU job packing with --gpu-bind

#!/bin/bash --login

#SBATCH --account=[your-account]
#SBATCH --partition=nvlinkq
#SBATCH --nodes=1
#SBATCH --ntasks=4
#SBATCH --ntasks-per-socket=2    #maximum 2 tasks per socket (each socket has 2 GPUs in this partition) #SBATCH --cpus-pert-task=1
#SBATCH --gres=gpu:4
#SBATCH --gpu-bind=map_gpu:0,1,2,3
#SBATCH --time=00:10:00
#SBATCH
--account=[your-account] #SBATCH
--export=NONE
 
#Default loaded compiler module is gcc module
 
module load cuda

#Run the cuda executable from a wrapper:
srun -u -N 1 -n 4 -c 1 wrapper.sh

And the wrapper.sh in this case is:

...

Related pages

...

Versions Compared

Old Version 41

New Version Current

Key

Packing serial/small multithreaded jobs

Related pages

Page Comparison

Versions Compared

Old Version 41

New Version Current

Key

Packing serial/small multithreaded jobs

Related pages