For "manual" binding, two auxiliary techniques need to be performed: 1) use of a wrapper that selects the correct GPU and 2) generate an ordered list to be used in the --cpu-bind option of srun : 900px
bashEmacsListing N. exampleScript_1NodeShared_3GPUs_bindMethod2.shtrue
Note that the wrapper for selecting the GPUs is being created with a redirection "trick" to the cat command. Also node that its name uses the SLURM_JOBID environment variable to make this wrapper unique to this job, and that the wrapper is deleted when execution is finalised. Now, let's take a look to the output after executing the script: 900px
bashDJangoTerminal N. Output for 3 GPUs job shared access. "Manual" method (method 2) for optimal binding.
The output of the hello_jobstep code tells us that job ran on node nid001004 and that 3 MPI tasks were spawned. Each of the MPI tasks has only 1 CPU-core assigned to it (with the use of the OMP_NUM_THREADS environment variable in the script) and can be identified with the HWT number. Also, each of the MPI tasks has only 1 visible GPU. The hardware identification of the GPU is done via the Bus_ID (as the other GPU_IDs are not physical but relative to the job). After checking the architecture diagram at the top of this page, it can be clearly seen that each of the assigned CPU-cores for the job is on a different L3 cache group chiplet (slurm-socket). But more importantly, it can be seen that the assigned GPU to each of the MPI tasks is the GPU that is directly connected to that chiplet, so that binding is optimal:
- CPU core "
019 " is on chiplet:2 and directly connected to GPU with Bus_ID:C9 - CPU core "
002 " is on chiplet:0 and directly connected to GPU with Bus_ID:D1 - CPU core "
009 " is on chiplet:1 and directly connected to GPU with Bus_ID:D6
According to the architecture diagram, this binding configuration is optimal. "Click" in the TAB above to read the script and output for the other method of GPU binding. |