...
There are two likely causes. Firstly, the job may have reached the maximum time it was allowed to run. Secondly, if you have a fixed allocation then your allocation may have been used up.
lurm Slurm allocates resources to a job for a fixed amount of time. This time limit is either specified in the job request, or if none is specified then it is the default limit. There are maximum limits on all Slurm partitions, so if you have not requested the maximum then try increasing the time limit in the request with the --time= flag to
sbatch
orsallo
.Column width 900px Code Block language bash theme Emacs #SBATCH --time=12:00:00
To see the maximum and default time limits, use
sinfo
:Column width 900px Code Block language bash theme DJango title Terminal 1. View the time limits for a queue $ sinfo -o "%.10P %.5a %.10l %.15L %.6D %.6t" -p workq PARTITION AVAIL TIMELIMIT DEFAULTTIME NODES STATE workq* up 1-00:00:00 1:00:00 1 drain workq* up 1-00:00:00 1:00:00 20 resv workq* up 1-00:00:00 1:00:00 34 mix workq* up 1-00:00:00 1:00:00 25 alloc
Usually if your allocation is not sufficient to support a job running to completion, Slurm will not start the job. However, if multiple jobs start at the same time then each job may not hit the limit but collectively they might. When this happens they will all start, but get terminated when the allocation is used up. You can tell this is the case if the elapsed time does not match the job's time limit.
Column width 900px Code Block language bash theme DJango $ sacct -j 2954681 -o jobid,elapsed,time JobID Elapsed Timelimit ------------ ---------- ---------- 2954681 05:54:30 1-00:00:00 2954681.bat+ 05:54:31 2954681.ext+ 05:54:31 2954681.0 05:54:30
If this is the case, check whether your allocation is used up. If it is, contact the Pawsey help desk. See Submitting and Monitoring JobsJobs#ProjectAccounting for more information about project accounting.
Column width 900px Code Block language bash theme DJango $ pawseyAccountBalance
...