Running Jobs on Setonix

Running Jobs on Setonix

4


Setonix uses the Slurm workload manager to schedule user programs for execution. To learn the generalities of using Slurm to schedule programs in supercomputers, visit the Job Scheduling page. In addition, please read the following subsections discuss the peculiarities of running jobs on Setonix, together with the Example Slurm Batch Scripts for Setonix on CPU Compute Nodes and Example Slurm Batch Scripts for Setonix on GPU Compute Nodes and Example Slurm Batch Scripts for Setonix-Q on GH200 Compute Nodes .

Important

It is highly recommended that you specify values for the --nodes, --ntasks, --cpus-per-task and --time options that are optimal for the job and for the system on which it will run. Also, use --mem if the job will not use all the resources in the node: shared access; or --exclusive for allocation of all resources in the requested nodes: exclusive access.

Overview

Each compute node of Setonix share its resources by default to run multiple jobs on the node at the same time, submitted by many users from the same or different projects. We call this configuration shared access and, as mentioned, is the default for Setonix nodes. Nevertheless, users can use slurm options to override the default and explicitly request for exclusive access to the requested nodes.

Partitions

Nodes are grouped in partitions. Each partition is characterised by a particular configuration of its resources and it is intended for a particular workload or stage of the scientific workflow development. Tables below show the list of partitions present on Setonix and their available resources per node.

Submitting jobs to the GPU partitions

You will need to use a different project code for the --account/-A option. More specifically, it is your project code followed by the -gpu suffix. For instance, if your project code is project1234, then you will have to use project1234-gpu.

Table 1. Slurm partitions for production jobs and data transfers on Setonix

Name

N. Nodes

Cores per node

Available node-RAM for jobs

GPU chiplets per node

Types of jobs supported

Max Number of Nodes per Job

Max Wall time

Max Number of Concurrent Jobs per User

Max Number of Jobs Submitted per User

work

1376

2x 64

230 GB

n/a

Supports CPU-based production jobs.

-

24h

256

1024

long

8

2x 64

230 GB

n/a

Long-running CPU-based production jobs.

1

96h

4

96

highmem

8

2x 64

980 GB

n/a

Supports CPU-based production jobs that require a large amount of memory.

1

96h

2

96

gpu

134

1x 64

230 GB

8

Supports GPU-based production jobs.

-

24h

64

1024

gpu-highmem

38

1x 64

460 GB

8

Supports GPU-based production jobs requiring large amount of host memory.

-

24h

8

256

copy

7

1x 32

115 GB

n/a

Copy of large data to and from the supercomputer's filesystems.

-

48h

4

500

askaprt

180

2x 64

230 GB

n/a

Dedicated to the ASKAP project (similar to work partition)

-

24h

8192

8192

casda

1

1x 32

115 GB

n/a

Dedicated to the CASDA project (similar to copy partition)

-

24h

30

40

mwa

10

2x 64

230 GB

n/a

Dedicated to the MWA projects (similar to work partition)

-

24h

1000

2000

mwa-asvo

10

2x 64

230 GB

n/a

Dedicated to the MWA projects (similar to work partition)

-

24h

1000

2000

mwa-gpu

10

1x 64

230 GB

8

Dedicated to the MWA projects (similar to gpu partition)

-

24h

1000

2000

mwa-asvocopy

2

1x 32

115 GB

n/a

Dedicated to the MWA projects (similar to copy partition)

-

48h

32

1000

quantum

4

4x72

857 GB

4

Dedicated to Setonix-Q merit allocation scheme and for running quantum computing simulation and hybrid quantum-classical workflows

-

24h

8

256

Table 2. Slurm partitions for debug and development on Setonix

Name

N. Nodes

Cores per node

Available node-RAM for jobs

GPU chiplets per node

Types of jobs supported

Max Number of Nodes per Job

Max Wall time

Max Number of Concurrent Jobs per User

Max Number of Jobs Submitted per User

debug

8

2x 64

230 GB

n/a

Exclusive for development and debugging of CPU code and workflows.

4

1h

1

4

gpu-dev

10

1x 64

230 GB

8

Exclusive for development and debugging of GPU code and workflows.

2

4h

1

4

quantum

4

4x72

857

4

As GH200 noes have different CPU architecture, codes must also be developed on the GH200 nodes. We suggest running a single GPU job test.

4

24h

8

256

Debug and Development Partitions Policy

Quality of Service

Each job submitted to the scheduler gets assigned a Quality of Service (QoS) level which determines the priority of the job with respect to the others in the queue. Usually, the default normal QoS applies. Users can boost the priority of their jobs up to 10% of their allocations, using the high QoS, in the following way:

$ sbatch --qos=high myscript.sh

Each project has an allocation for a number of service units (SUs) in a year, which is broken into quarters. Jobs submitted under a project will subtract SUs from the project's allocation. A project that has entirely consumed its SUs for a given quarter of the year will run its jobs in low priority mode for that time period.

Table 3. Quality of Service levels applicable to a Slurm job running on Setonix

Name

Priority Level

Description

lowest

0

Reserved for particular cases.

low

3000

Priority for jobs past the 100% allocation usage.

normal

10000

The default priority for production jobs.

high

14000

Priority boost available to all projects for a fraction (10%) of their allocation.

highest

20000

Assigned to jobs that are of critical interest (e.g. project part of the national response to an emergency).

exhausted

0

Assigned to jobs from projects that have consumed significantly more than their allocation, which are prevented from running until the quarterly reset.

Job Queue Limits

Users can check the limits on the maximum number of jobs that users can run at a time (i.e., MaxJobs) and the maximum number of jobs that can be submitted (i.e., MaxSubmitJobs) for each partition on Setonix using the command:

$ sacctmgr show associations user=$USER cluster=setonix

Additional constraints are imposed on projects that have overused their quarterly allocation.

Executing large jobs

When executing large, multinode jobs on Setonix, the use of the --exclusive option in the batch script is recommended. The addition will result in better resource utilisation within each node assigned to the job.

Related pages