/
Exceeded Job Memory Limit
Exceeded Job Memory Limit
Problem
A job on the Pawsey supercomputers fails with "slurmstepd: error: Exceeded job memory limit at some point."
Solution
This shows that the job has exhausted all the memory available on a core/node. This error can occur shortly after the job has started, or much later in execution, depending on the demand for memory by the application.
There are the two options to solve this problem:
- Explicitly request for more memory for the job, using the directive
#SBATCH --mem=10G
(10 GB in this example). Note that jobs are charged based on whichever is higher of the fraction of cores used per node or fraction of memory used per node, so increasing the total requested memory may cause your job to be charged at a higher rate. - Increase the memory available to the application by reducing the number of tasks your code uses within a node (but still request the same number of CPUs for the job's allocation, otherwise another job might be allocated on the node).
- Reduce the memory requirement of the application - which may mean reducing the problem size, but may also mean checking for the possibility of memory leaks if you are developing your own code.
, multiple selections available,
Related content
More Processors Requested Than Permitted
More Processors Requested Than Permitted
More like this
Slurm Job Cancelled Due to Time Limit
Slurm Job Cancelled Due to Time Limit
More like this
slurmstepd: error: execve(): executable : No Such File or Directory
slurmstepd: error: execve(): executable : No Such File or Directory
More like this
File Count Quota Exceeded on /scratch
File Count Quota Exceeded on /scratch
More like this
Segmentation Fault Occurred
Segmentation Fault Occurred
More like this