Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Column
width900px


Code Block
languagebash
themeEmacs
titleListing 2. Example of a job array with many files
#!/bin/bash --login
#This example use general SBATCH settings, but please refer to the specific guide
#of intended the cluster for possible needed changes

#
# SLURM directives
#
# This is an array job with 35 subtasks, (--array=0-34).
#
# The output for each subtask will be sent to a separate file
# identified by the jobid (--output=array-%j.out)
# 
# Each subtask will occupy one node (--nodes=1) with
# a wall-clock time limit of twenty minutes (--time=00:20:00)
#
# Replace [your-project] with the appropriate project name
# following --account (e.g., --account=project123)


#SBATCH --account=[your-project]
#SBATCH --output=array-%j.out
#SBATCH --array=0-34       #this should match the number of input files
#SBATCH --nodes=1
#SBATCH --ntasks=1          #One main task per file to be processed in each array-subtask
#SBATCH --cpus-per-task=1 	#this will vary depending on the requirements of the task
#SBATCH --mem=1840M         #Needed memory per array-subtask (or use --exclusive for exclusive access)
#SBATCH --time=00:20:00

#---  
echo "All jobs in this array have:"
echo "- SLURM_ARRAY_JOB_ID=${SLURM_ARRAY_JOB_ID}"
echo "- SLURM_ARRAY_TASK_COUNT=${SLURM_ARRAY_TASK_COUNT}"
echo "- SLURM_ARRAY_TASK_MIN=${SLURM_ARRAY_TASK_MIN}"
echo "- SLURM_ARRAY_TASK_MAX=${SLURM_ARRAY_TASK_MAX}"
 
echo "This job in the array has:"
echo "- SLURM_JOB_ID=${SLURM_JOB_ID}"
echo "- SLURM_ARRAY_TASK_ID=${SLURM_ARRAY_TASK_ID}"

#--- 
#Specific settings for the cluster you are on
#(Check the specific guide of the cluster for additional settings)

#---  
# grab our filename from a directory listing
FILES=($(ls -1 *.input.txt)) #this pulls in all the files ending with input.txt
FILENAME=${FILES[$SLURM_ARRAY_TASK_ID]} #this allows the slurm to enter the input.txt files into the job array
echo "My input file is ${FILENAME}" #this will print the file name into the log file 

#---  
#example job using the above variables
srun -N 1 -n 1 -c 1 ExpansionHunterDenovo-v0.8.7-linux_x86_64/scripts/casecontrol.py locus \
        --manifest ${FILENAME} \
        --output ${FILENAME}.CC_locus.tsv


...

Column
width900px


Code Block
languagebash
themeDJango
titleTerminal 4. View jobs in queue for user
$ squeue -u espinosamatilda
JOBID    USER     ACCOUNT                   NAME EXEC_HOST ST     REASON   START_TIME     END_TIME  TIME_LEFT NODES   PRIORITY
3483798  espn  matilda   pawsey            iterativeJob  nid00017  R       None     14:44:27     14:47:27       2:50     1       5269
3483799  espn matilda    pawsey            iterativeJob       n/a PD Dependency          N/A          N/A       5:00     1       5269


...

Note that X11 forwarding is enabled by default in the interactive queue.

We Nevertheless, we recommend users to use FastX, a Pawsey's web-based remote-visualisation service on Topaz, to launch compute-intensive visualisation packages such as ParaView, VisIt or VMD. Refer Please refer to theSetonix Remote Visualisation - Topaz support page for more information documentation for detailed instructions of access and use.

Packing serial/small multithreaded jobs 

...

When to use job arrays: For nodes that can be shared (like gpuq pertition in Topaz) , the best practice is to use job arrays. A disadvantage of job packing on shared nodes is that unbalanced steps might lead to resources being held unnecessarily. When using arrays this problem does not exist because, as soon as any job finishes or fails, the resources for that job are freed for use by another user.

...

Column
width900px


Code Block
languagebash
themeEmacs
titleListing 21. GPU job array example
#!/bin/bash --login

#SBATCH --account=[your-account]-gpu
#SBATCH --array=0-7
#SBATCH --partition=gpuqgpu
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=80G
#SBATCH --gres=gpu:gpu=1
#SBATCH --time=00:10:00

 
#Default loaded compiler module is gcc
module

module load cuda

#Go to the right directory for this instance of the job array using SLURM_ARRAY_TASK_ID as the identifier:
#We are assuming all the input files needed for each specific job reside in the corresponding working directory
cd workingDir_${SLURM_ARRAY_TASK_ID}

#Run the cudahip executable (asumingassuming the same executable will be used by each job, and that it resides in the submission directory):
srun -u -N 1 -n 1 -c 1 ${SLURM_SUBMIT_DIR}/main_cudahip


When to use job packing: For nodes where resources are exclusive and cannot be shared among different users/jobs at the same time (like nvlinkq partition in Topaz) the best practise is to to use job packing When your workflow requieres the execution of several jobs, but each individual job does not requiere the whole resources of a node. In that case, you might consider to use job packing and execute your several jobs in the same node at the same time. Job packing allows to use all (or most) of the resources of the node. Ideally, multiple jobs should be packed in order to make use of the four all available GPUs in the node and jobs should have a similar estimated execution time to avoid load balancing issues. (Obviously if a single job can make use of all the four GPUs, that is also desirable and that would not need packing.) We do not recommend packing jobs across  across multiple nodes with the same job script due to possible load balancing issues: all resources will be held and unavailable to other users/jobs until the last substep (job) in the packing finishes.

...

Column
width900px


Code Block
languagebash
themeEmacs
titleListing 22. GPU job packing example using multiple steps simultaneously
#!/bin/bash --login


#SBATCH --account=[your-account]-gpu
#SBATCH --partition=nvlinkqgpu
#SBATCH --nodes=1
#SBATCH --ntasks=4
#SBATCH --ntasks-per-socket=2   #maximum 2 tasks per socket (each socket has 2 GPUs in this partition)
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:4
#SBATCH --time=00:10:00
 
#Default loaded compiler module is gcc module
 
module load cuda

for tagID in $(seq 0 3); do
   #Go to the right directory for this step of the job pack using tagID as the identifier:
   #We are assuming all the input files needed for each specific job reside in the corresponding working directory
   cd ${SLURM_SUBMIT_DIR}/workingDir_${tagID}

   #Defining an output file for this step
   outputFile=results_${tagID}.out
   echo "Starting" > $outputFile

   #Run the cuda executable (asuming the same executable will be used by each step, and that it resides in the submission directory):
   srun -u -N 1 -n 1 --mem=56G --gres=gpu:1 --exact ${SLURM_SUBMIT_DIR}/main_cuda >> $outputFile &
done
wait


Note
iconfalse
titleNotes
  • In the header a total of four GPUS is requested. For each job step the specific number of GPUs to be used (1 in this case) is indicated. The use of --mem=56G indicates the amount of memory to be allocated for each step, and the --exact allows access to only the resources requested for the step.
  • Note the logic of the use of " & .. & ..wait" for being able to execute each step in the background and wait for them to finish before ending the job script.
  • In the loop, the iterator (numeric identifier) for each step is defined to start at 0 in order to be equivalent to the natural numbering of Slurm, but you can use any start and end value to be consistent with your own naming of directories, input files and output files.


...

...