...
Column |
---|
|
Code Block |
---|
language | bash |
---|
theme | Emacs |
---|
title | Listing 2. Example of a job array with many files |
---|
| #!/bin/bash --login
#This example use general SBATCH settings, but please refer to the specific guide
#of intended the cluster for possible needed changes
#
# SLURM directives
#
# This is an array job with 35 subtasks, (--array=0-34).
#
# The output for each subtask will be sent to a separate file
# identified by the jobid (--output=array-%j.out)
#
# Each subtask will occupy one node (--nodes=1) with
# a wall-clock time limit of twenty minutes (--time=00:20:00)
#
# Replace [your-project] with the appropriate project name
# following --account (e.g., --account=project123)
#SBATCH --account=[your-project]
#SBATCH --output=array-%j.out
#SBATCH --array=0-34 #this should match the number of input files
#SBATCH --nodes=1
#SBATCH --ntasks=1 #One main task per file to be processed in each array-subtask
#SBATCH --cpus-per-task=1 #this will vary depending on the requirements of the task
#SBATCH --mem=1840M #Needed memory per array-subtask (or use --exclusive for exclusive access)
#SBATCH --time=00:20:00
#---
echo "All jobs in this array have:"
echo "- SLURM_ARRAY_JOB_ID=${SLURM_ARRAY_JOB_ID}"
echo "- SLURM_ARRAY_TASK_COUNT=${SLURM_ARRAY_TASK_COUNT}"
echo "- SLURM_ARRAY_TASK_MIN=${SLURM_ARRAY_TASK_MIN}"
echo "- SLURM_ARRAY_TASK_MAX=${SLURM_ARRAY_TASK_MAX}"
echo "This job in the array has:"
echo "- SLURM_JOB_ID=${SLURM_JOB_ID}"
echo "- SLURM_ARRAY_TASK_ID=${SLURM_ARRAY_TASK_ID}"
#---
#Specific settings for the cluster you are on
#(Check the specific guide of the cluster for additional settings)
#---
# grab our filename from a directory listing
FILES=($(ls -1 *.input.txt)) #this pulls in all the files ending with input.txt
FILENAME=${FILES[$SLURM_ARRAY_TASK_ID]} #this allows the slurm to enter the input.txt files into the job array
echo "My input file is ${FILENAME}" #this will print the file name into the log file
#---
#example job using the above variables
srun -N 1 -n 1 -c 1 ExpansionHunterDenovo-v0.8.7-linux_x86_64/scripts/casecontrol.py locus \
--manifest ${FILENAME} \
--output ${FILENAME}.CC_locus.tsv |
|
...
Column |
---|
|
Code Block |
---|
language | bash |
---|
theme | DJango |
---|
title | Terminal 4. View jobs in queue for user |
---|
| $ squeue -u espinosamatilda
JOBID USER ACCOUNT NAME EXEC_HOST ST REASON START_TIME END_TIME TIME_LEFT NODES PRIORITY
3483798 espn matilda pawsey iterativeJob nid00017 R None 14:44:27 14:47:27 2:50 1 5269
3483799 espn matilda pawsey iterativeJob n/a PD Dependency N/A N/A 5:00 1 5269 |
|
...
Note that X11 forwarding is enabled by default in the interactive queue.
We Nevertheless, we recommend users to use FastX, a Pawsey's web-based remote-visualisation service on Topaz, to launch compute-intensive visualisation packages such as ParaView, VisIt or VMD. Refer Please refer to theSetonix Remote Visualisation - Topaz support page for more information documentation for detailed instructions of access and use.
Packing serial/small multithreaded jobs
...
When to use job arrays: For nodes that can be shared (like gpuq pertition in Topaz) , the best practice is to use job arrays. A disadvantage of job packing on shared nodes is that unbalanced steps might lead to resources being held unnecessarily. When using arrays this problem does not exist because, as soon as any job finishes or fails, the resources for that job are freed for use by another user.
...
Column |
---|
|
Code Block |
---|
language | bash |
---|
theme | Emacs |
---|
title | Listing 21. GPU job array example |
---|
| #!/bin/bash --login
#SBATCH --account=[your-account]-gpu
#SBATCH --array=0-7
#SBATCH --partition=gpuqgpu
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-taskgpu=1
#SBATCH --mem=80G
#SBATCH --gres=gpu:1
#SBATCH --time=00:10:00
#Default loaded compiler module is gcc module
module load cuda
#Go to the right directory for this instance of the job array using SLURM_ARRAY_TASK_ID as the identifier:
#We are assuming all the input files needed for each specific job reside in the corresponding working directory
cd workingDir_${SLURM_ARRAY_TASK_ID}
#Run the cudahip executable (asumingassuming the same executable will be used by each job, and that it resides in the submission directory):
srun -u -N 1 -n 1 -c 1 ${SLURM_SUBMIT_DIR}/main_cudahip |
|
When to use job packing: For nodes where resources are exclusive and cannot be shared among different users/jobs at the same time (like nvlinkq partition in Topaz) the best practise is to to use job packing When your workflow requieres the execution of several jobs, but each individual job does not requiere the whole resources of a node. In that case, you might consider to use job packing and execute your several jobs in the same node at the same time. Job packing allows to use all (or most) of the resources of the node. Ideally, multiple jobs should be packed in order to make use of the four all available GPUs in the node and jobs should have a similar estimated execution time to avoid load balancing issues. (Obviously if a single job can make use of all the four GPUs, that is also desirable and that would not need packing.) We do not recommend packing jobs across across multiple nodes with the same job script due to possible load balancing issues: all resources will be held and unavailable to other users/jobs until the last substep (job) in the packing finishes.
...
Column |
---|
|
Code Block |
---|
language | bash |
---|
theme | Emacs |
---|
title | Listing 22. GPU job packing example using multiple steps simultaneously |
---|
| #!/bin/bash --login
#SBATCH --account=[your-account]-gpu
#SBATCH --partition=nvlinkqgpu
#SBATCH --nodes=1
#SBATCH --ntasks=4
#SBATCH --ntasks-per-socket=22 #maximum 2 tasks per socket (each socket has 2 GPUs in this partition)
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:4
#SBATCH --time=00:10:00
#Default loaded compiler module is gcc module
module load cuda
for tagID in $(seq 0 3); do
#Go to the right directory for this step of the job pack using tagID as the identifier:
#We are assuming all the input files needed for each specific job reside in the corresponding working directory
cd ${SLURM_SUBMIT_DIR}/workingDir_${tagID}
#Defining an output file for this step
outputFile=results_${tagID}.out
echo "Starting" > $outputFile
#Run the cuda executable (asuming the same executable will be used by each step, and that it resides in the submission directory):
srun -u -N 1 -n 1 --mem=056G --gres=gpu:1 --exact ${SLURM_SUBMIT_DIR}/main_cuda >> $outputFile &
done
wait |
Note |
---|
| - In the header a total of four GPUS is requested. For each job step the specific number of GPUs to be used (1 in this case) is indicated. The use of
--mem=0 is to avoid memory restrictions56G indicates the amount of memory to be allocated for each step, and the --exact option avoids possible sharing of allows access to only the resources requested for that specific the step. - Note the logic of the use of "
& .. & ..wait " for being able to execute each step in the background and wait for them to finish before ending the job script. - In the loop, the iterator (numeric identifier) for each step is defined to start at 0 in order to be equivalent to the natural numbering of Slurm, but you can use any start and end value to be consistent with your own naming of directories, input files and output files.
|
|
...
Column |
---|
|
Code Block |
---|
language | bash |
---|
theme | Emacs |
---|
title | Listing 23. GPU job packing with --gpu-bind |
---|
| #!/bin/bash --login
#SBATCH --account=[your-account]
#SBATCH --partition=nvlinkq
#SBATCH --nodes=1
#SBATCH --ntasks=4
#SBATCH --ntasks-per-socket=2 #maximum 2 tasks per socket (each socket has 2 GPUs in this partition) #SBATCH --cpus-pert-task=1
#SBATCH --gres=gpu:4
#SBATCH --gpu-bind=map_gpu:0,1,2,3
#SBATCH --time=00:10:00
#SBATCH
--account=[your-account] #SBATCH
--export=NONE
#Default loaded compiler module is gcc module
module load cuda
#Run the cuda executable from a wrapper:
srun -u -N 1 -n 4 -c 1 wrapper.sh |
|
And the wrapper.sh
in this case is:
...
Related pages
...