...
Each GPU node have 4 MI250X GPU cards, which in turn have 2 Graphical Compute Die (GCD), which are seen as 2 logical GPUs; so each GPU node has 8 GCDs that is equivalent to 8 slurm GPUs. On the other hand, the single AMD CPU chip has 64 cores organised in 8 groups that share the same L3 cache. Each of these L3 cache groups (or chiplets) have a direct Infinity Fabric connection with just one of the GCDs, providing optimal bandwidth. Each chiplet can communicate with other GCDs, albeit at a lower bandwidth due to the additional communication hops. (In the examples below, we use the numbering of the cores and bus IDs of the GCD to identify the allocated chiplets and GCDs, and their binding.)
...
title | Important: GCD vs GPU |
---|
...
In order to achieve best performance, the current allocation method uses a basic allocation unit called "allocation pack". Users should then only request for a number of "allocation packs". Each allocation pack consists of:
...