MPICH Jobs report “Inconsistent number of NICs” and abort

MPICH Jobs report “Inconsistent number of NICs” and abort

Problem:

I see messages like this when running jobs on Setonix:

0: MPICH ERROR: All nodes in this job do not contain the same number of NICs. Aborting job as performance may be affected. 0: Set MPICH_OFI_SKIP_NIC_SYMMETRY_TEST=1 to bypass this check. 0: MPICH ERROR [Rank 0] [job id 12345678.0] [Ddd Mmm 00 00:00:00 2024] [nid001234] - Abort(-1) (rank 0 in comm 0): Inconsistent number of NICs across the job (Other MPI error) 0: 0: aborting job: 0: Inconsistent number of NICs across the job

What should I do?

Diagnosis:

The NIC here is a “Network Interconnect” and are the physical connection ports, into which the various network cabling infrastructure is plugged.

You are seeing these messages as a result of a small number of the Setonix CPU blades residing in cabinets that mostly contains GPU blades.

The latter have more NICs than the former, but our vendor decided to cable up the CPU blades in the GPU cabinets with the same number as the GPU blades.

You may, therefore, sometimes get an allocation of CPU nodes that contains nodes from both CPU-only and GPU/CPU cabinets.

Solution:

Although the error message gives you a way to ignore the issue, you probably don’t want to do that, unless you know what your MPI traffic is doing, and so our general advice, when running non-GPU MPI codes, is to explicitly exclude a subset of the CPU nodes, so as to avoid running MPI jobs on the CPU nodes in the GPU cabinet.

You can achieve this by using this argument

--exclude=nid00[2024-2055],nid00[2792-2823]

to your sbatch command, or placing its job submission script directive equivalent,

#SBATCH --exclude=nid00[2024-2055],nid00[2792-2823]

into your job submission script.

Technical Notes:

If you think that your CPU-only MPI job is able to make beneficial use of multiple NICs per node, then you can request that your job should only be allocated nodes that have multiple NICs, by using a --nodelist=<nids> argument or directive.

Note that explicitly specifying the nodes you want, may see your job spend longer in the queue, as it waits for the specific nodes to
become available.

We are hoping that our vendor will eventually get around to ensuring that all of our CPU nodes have the same hardware configurations, after which, this issue will no longer affect CPU-only MPI jobs