...
Currently there are issues running MPI-enabled software that makes use of parallel IO from within a container being run by the Singularity container engine. The error message seen will be similar to:
Column | ||||||
---|---|---|---|---|---|---|
Code Block | | |||||
|
Currently, HPE Cray has still not provided a fix for this issue. Pawsey is working on testing several possible solutions.
...
At this point in time, no full workaround has been identified. One current recommendation to anyone experiencing this hang is to try and adjust the distribution of ranks across nodes by making it more compact if possible. Pawsey staff testing has shown that when fewer nodes are used, more total ranks are needed to trigger the hang, given other variables (such as amount of data being sent) are the same.
ANSYS FLUENT
Multi node mpi issue
Ansys fluent cannot run multi node jobs on Setonix with cray-mpich as MPI implementation due to incompatible binaries (not meant for CRAY EX system with slingshot).
Issue has been raised with Ansys, waiting for a resolution. Currently, Ansys fluent can run on a single node with 128 cores on a work partition.
Code Block |
---|
fluent 3ddp -g -t${SLURM_NTASKS} -mpi=intel inputfile.jou |
Building Software
Performance of the Cray environment
...