Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Column
width900px


Code Block
languagebash
themeEmacs
titleListing 22. GPU job packing example using multiple steps simultaneously
#!/bin/bash --login


#SBATCH --account=[your-account]
#SBATCH --partition=nvlinkq
#SBATCH --nodes=1
#SBATCH --ntasks=4
#SBATCH --ntasks-per-socket=2
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:4
#SBATCH --time=00:10:00
 
#Default loaded compiler module is gcc module
 
module load cuda

for tagID in $(seq 0 3); do
   #Go to the right directory for this step of the job pack using tagID as the identifier:
   #We are assuming all the input files needed for each specific job reside in the corresponding working directory
   cd ${SLURM_SUBMIT_DIR}/workingDir_${tagID}

   #Defining an output file for this step
   outputFile=results_${tagID}.out
   echo "Starting" > $outputFile

   #Run the cuda executable (asuming the same executable will be used by each step, and that it resides in the submission directory):
   srun -u -N 1 -n 1 --mem=56G --gres=gpu:1 --exact ${SLURM_SUBMIT_DIR}/main_cuda >> $outputFile &
done
wait


Note
iconfalse
titleNotes
  • In the header a total of four GPUS is requested. For each job step the specific number of GPUs to be used (1 in this case) is indicated. The use of --mem=0 is to avoid memory restrictions56G indicates the amount of memory to be allocated for each step, and the --exact option avoids possible sharing of the resources requested for that specific step.
  • Note the logic of the use of " & .. & ..wait" for being able to execute each step in the background and wait for them to finish before ending the job script.
  • In the loop, the iterator (numeric identifier) for each step is defined to start at 0 in order to be equivalent to the natural numbering of Slurm, but you can use any start and end value to be consistent with your own naming of directories, input files and output files.


...