/
Slurm Job Cancelled Due to Time Limit

Slurm Job Cancelled Due to Time Limit

Problem

A job finishes with a message saying slurmstepd error, with the job cancelled due to time limit. The error message will be in the Slurm output file. The slurmstepd error message may be preceded with a message from srun that the job step was aborted.

slurmstepd: error: *** JOB 3501970 ON nid00161 CANCELLED AT 2018-01-08T15:17:49 DUE TO TIME LIMIT ***

Solution

The job may have reached the maximum time it was allowed to run.

Slurm allocates resources to a job for a fixed amount of time. This time limit is either specified in the job request, or if none is specified then it is the default limit. There are maximum limits on all Slurm partitions, so if you have not requested the maximum then try increasing the time limit in the request with the --time= flag to sbatch or sallo.

#SBATCH --time=12:00:00

To see the maximum and default time limits, use sinfo:

Terminal 1. View the time limits for a queue
$ sinfo -o "%.10P %.5a %.10l %.15L %.6D %.6t" -p workq
 PARTITION AVAIL  TIMELIMIT     DEFAULTTIME  NODES  STATE
    workq*    up 1-00:00:00         1:00:00      1  drain
    workq*    up 1-00:00:00         1:00:00     20   resv
    workq*    up 1-00:00:00         1:00:00     34    mix
    workq*    up 1-00:00:00         1:00:00     25  alloc

Filter by label

There are no items with the selected labels at this time.





Related content

Number of Nodes are not Set Correctly When Submitting From One Cluster to Another
Number of Nodes are not Set Correctly When Submitting From One Cluster to Another
More like this
Connection Closed By Remote Host
Connection Closed By Remote Host
Read with this
Exceeded Job Memory Limit
Exceeded Job Memory Limit
More like this
ssh: connect ...: Operation timed out
ssh: connect ...: Operation timed out
Read with this
QOSMaxJobs Appearing for Pending Slurm Jobs
QOSMaxJobs Appearing for Pending Slurm Jobs
More like this
Jobs Are in Queue for a Long Time
Jobs Are in Queue for a Long Time
Read with this