Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Ui tabs


Ui tab
titleMethod 1: Optimal binding using srun parameters

For optimal binding using srun parameters the options "--gpus-per-task" & "--gpu-bind=closest" need to be used:

900px


bashEmacsListing N. exampleScript_2NodesExclusive_16GPUs_bindMethod1.shtrue


Now, let's take a look to the output after executing the script:

900px


bashDJangoTerminal N. Output for 16 GPUs job (2 nodes) exclusive access


According to the architecture diagram, this binding configuration is optimal.

Method 1 may fail for some applications.

This first method is simpler, but may not work for all codes. "Manual" binding (method 2) may be the only reliable method for codes relying OpenMP or OpenACC pragma's for moving data from/to host to/from GPU and attempting to use GPU-to-GPU enabled MPI communication.

"Click" in the TAB above to read the script and output for the other method of GPU binding.


Ui tab
titleMethod 2: "Manual" optimal binding of GPUs and chipletsNothing



Ui tab
titleMethod 2: "Manual" optimal binding of GPUs and chiplets

For "manual" binding, two auxiliary techniques need to be performed: 1) use of a wrapper that selects the correct GPU and 2) generate an ordered list to be used in the --cpu-bind option of srun:

900px


bashEmacsListing N. exampleScript_2NodesExclusive_16GPUs_bindMethod2.shtrue


Note that the wrapper for selecting the GPUs is being created with a redirection "trick" to the cat command. Also node that its name uses the SLURM_JOBID environment variable to make this wrapper unique to this job, and that the wrapper is deleted when execution is finalised.

Now, let's take a look to the output after executing the script:

900px


bashDJangoTerminal N. Output for 16 GPUs job (2 nodes) exclusive access


According to the architecture diagram, this binding configuration is optimal.

"Click" in the TAB above to read the script and output for the other method of GPU binding.


...