Anchor | ||||
---|---|---|---|---|
|
Setonix uses the Slurm workload manager to schedule user programs for execution. To learn the generalities of using Slurm to schedule programs in supercomputers, visit the Job Scheduling page. In addition, please read the following subsections discuss the peculiarities of running jobs on Setonix, together with the Example Slurm Batch Scripts for Setonix on CPU Compute Nodes and Example Slurm Batch Scripts for Setonix on GPU Compute Nodes.
Column | |||||
---|---|---|---|---|---|
|
Overview
Each compute node of Setonix share its resources by default to run multiple jobs on the node at the same time, submitted by many users from the same or different projects. We call this configuration shared access and, as mentioned, is the default for Setonix nodes. Nevertheless, users can use slurm options to override the default and explicitly request for exclusive access to the requested nodes.
Nodes are grouped in partitions. Each partition is characterised by a particular configuration of its resources and it is intended for a particular workload or stage of the scientific workflow development. Table 1 shows the list of partitions present on Setonix and their available resources per node.
Each job submitted to the scheduler gets assigned a Quality of Service (QoS) level which determines the priority of the job with respect to the others in the queue. Usually, the default normal QoS applies. Users can boost the priority of their jobs up to 10% of their allocations, using the high QoS, in the following way:
$ sbatch --qos=high myscript.sh
Each project has an allocation for a number of service units (SUs) in a year, which is broken into quarters. Jobs submitted under a project will subtract SUs from the project's allocation. A project that has entirely consumed its SUs for a given quarter of the year will run its jobs in low priority mode for that time period. If a project's SU consumption (for a given quarter) hits the 150% usage mark with respect to its granted allocation, no further jobs will be able to run under the project.
Column | |||||
---|---|---|---|---|---|
|
Table 1. Slurm partitions on Setonix
Excerpt | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Table 2. Quality of Service levels applicable to a Slurm job running on Setonix
Name | Priority Level | Description |
---|---|---|
lowest | 0 | Reserved for particular cases. |
low | 3000 | Priority for jobs past the 100% allocation usage. |
normal | 10000 | The default priority for production jobs. |
high | 14000 | Priority boost available to all projects for a fraction (10%) of their allocation. |
highest | 20000 | Assigned to jobs that are of critical interest (e.g. project part of the national response to an emergency). |
exhausted | 0 | QoS for jobs for projects that have consumed more than 150% of their allocation. |
Job Queue Limits
Users can check the limits on the maximum number of jobs that users can run at a time (i.e., MaxJobs
) and the maximum number of jobs that can be submitted(i.e., MaxSubmitJobs
) for each partition on Setonix using the command:
$ sacctmgr show associations user=$USER cluster=setonix
Additional constraints are imposed on projects that have overused their quarterly allocation.
Executing large jobs
When executing large, multinode jobs on Setonix, the use of the --exclusive
option in the batch script is recommended. The addition will result in better resource utilisation within each node assigned to the job.
Section | |||||||||
---|---|---|---|---|---|---|---|---|---|
|