...
Column |
---|
|
Code Block |
---|
language | bash |
---|
theme | Emacs |
---|
title | Listing 1. Job array MPI example |
---|
| #!/bin/bash --login
#This example use general SBATCH settings, but please refer to the specific guide
#of intended the cluster for possible needed changes
# SLURM directives
#
# This is an array job with two subtasks 0 and 1 (--array=0,1).
#
# The output for each subtask will be sent to a separate file
# identified by the jobid (--output=array-%j.out)
#
# Each subtask will occupy one node (--nodes=1) with
# a wall-clock time limit of 20 minutes (--time=00:20:00)
#
# Replace [your-project] with the appropriate project name
# following --account (e.g., --account=project123)
#SBATCH --account=[your-project]
#SBATCH --array=0,1
#SBATCH --output=array-%j.out
#SBATCH --nodes=1
#SBATCH --ntasks=32 #this need to be indicated for shared access nodes
#SBATCH --ntasks-per-socket=32
#SBATCH --cpus-per-task=1 #confirm the number of cpus per task
#SBATCH --mem=58G #specify when asking for shared access to compute nodes (or use --exclusive for exclusive access)
#SBATCH --time=00:20:00
# ---
# To launch the job, we specify to srun 24 MPI tasks (-n 24)
# to run on the node
#
# Note we avoid any inadvertent OpenMP threading by setting
# OMP_NUM_THREADS=1
#
# The input to the execuatable is the unique array task identifier
# $SLURM_ARRAY_TASK_ID which will be either 0 or 1
export OMP_NUM_THREADS=1
# ---
# Set MPI related environment variables. Not all need to be set
# main variables for multi-node jobs (uncomment for multinode jobs)
#export MPICH_OFI_STARTUP_CONNECT=1
#export MPICH_OFI_VERBOSE=1
#Ask MPI to provide useful runtime information (uncomment if debugging)
#export MPICH_ENV_DISPLAY=1
#export MPICH_MEMORY_REPORT=1
#---
#Specific settings for the cluster you are on
#(Check the specific guide of the cluster for additional settings)
#---
echo This job shares a SLURM array job ID with the parent job: $SLURM_ARRAY_JOB_ID
echo This job has a SLURM job ID: $SLURM_JOBID
echo This job has a unique SLURM array index: $SLURM_ARRAY_TASK_ID
#----
#Execute command:
srun -N 1 -n 32 -c 1 ./code_mpi.x $SLURM_ARRAY_TASK_ID |
|
...
Column |
---|
|
Code Block |
---|
language | bash |
---|
theme | Emacs |
---|
title | Listing 2. Example of a job array with many files |
---|
| #!/bin/bash --login
#This example use general SBATCH settings, but please refer to the specific guide
#of intended the cluster for possible needed changes
#
# SLURM directives
#
# This is an array job with 35 subtasks, (--array=0-34).
#
# The output for each subtask will be sent to a separate file
# identified by the jobid (--output=array-%j.out)
#
# Each subtask will occupy one node (--nodes=1) with
# a wall-clock time limit of twenty minutes (--time=00:20:00)
#
# Replace [your-project] with the appropriate project name
# following --account (e.g., --account=project123)
#SBATCH --account=[your-project]
#SBATCH --output=array-%j.out
#SBATCH --array=0-34 #this should match the number of input files
#SBATCH --nodes=1
#SBATCH --ntasks=1 #One main task per file to be processed in each array-subtask
#SBATCH --cpus-per-task=1 #this will vary depending on the requirements of the task
#SBATCH --mem=1840M #Needed memory per array-subtask (or use --exclusive for exclusive access)
#SBATCH --time=00:20:00
#---
echo "All jobs in this array have:"
echo "- SLURM_ARRAY_JOB_ID=${SLURM_ARRAY_JOB_ID}"
echo "- SLURM_ARRAY_TASK_COUNT=${SLURM_ARRAY_TASK_COUNT}"
echo "- SLURM_ARRAY_TASK_MIN=${SLURM_ARRAY_TASK_MIN}"
echo "- SLURM_ARRAY_TASK_MAX=${SLURM_ARRAY_TASK_MAX}"
echo "This job in the array has:"
echo "- SLURM_JOB_ID=${SLURM_JOB_ID}"
echo "- SLURM_ARRAY_TASK_ID=${SLURM_ARRAY_TASK_ID}"
#---
#Specific settings for the cluster you are on
#(Check the specific guide of the cluster for additional settings)
#---
# grab our filename from a directory listing
FILES=($(ls -1 *.input.txt)) #this pulls in all the files ending with input.txt
FILENAME=${FILES[$SLURM_ARRAY_TASK_ID]} #this allows the slurm to enter the input.txt files into the job array
echo "My input file is ${FILENAME}" #this will print the file name into the log file
#---
#example job using the above variables
ExpansionHunterDenovo-v0.8.7-linux_x86_64/scripts/casecontrol.py locus \
--manifest ${FILENAME} \
--output ${FILENAME}.CC_locus.tsv |
|
...
Column |
---|
|
Code Block |
---|
language | bash |
---|
theme | Emacs |
---|
title | Listing 3. Example job chaining script |
---|
| #!/bin/bash -l
#SBATCH --account=[your-project]
#SBATCH --nodes=xx
#SBATCH --ntasks=yy #this directive is required on #This example use general SBATCH settings, but please refer to the specific guide
#of intended the cluster for possible needed changes
#SBATCH --account=[your-project]
#SBATCH --nodes=xx
#SBATCH --ntasks=yy #this directive is required on setonix to request yy tasks
#SBATCH --cpus-per-task=zz
#SBACTH --exclusive #For exclusive access to node resources (or use --mem for shared access)
#SBATCH --time=00:05:00
srun -N xx -n yy -c zz ./a.out # fill in the srun options '-N xx/-n yy/etc' to be appropriate to run the job
sbatch next_job.sh |
|
...
Column |
---|
|
Code Block |
---|
language | bash |
---|
theme | Emacs |
---|
title | Listing 4. Example dependency job script |
---|
| #!/bin/bash -l
#This example use general SBATCH settings, but please refer to the specific guide
#of intended the cluster for possible needed changes
#SBATCH --account=[your-project]
#SBATCH --nodes=xx
#SBATCH --ntasks=yy #this directive is required on setonix to request yy tasks
#SBATCH --cpus-per-task=zz
#SBACTH --exclusive #For exclusive access to node resources (or use --mem for shared access)
#SBATCH --time=00:05:00
sbatch --dependency=afternotok:${SLURM_JOB_ID} next_job.sh
srun -N xx -n yy -c zz ./code.x # fill in the srun options '-N xx -n yy' etc. to be appropriate to run the job |
|
...
Column |
---|
|
Code Block |
---|
language | bash |
---|
theme | Emacs |
---|
title | Listing 7. iterative.sh |
---|
| #!/bin/bash -l
#----------
#This example use general SBATCH settings, but please refer to the specific guide
#of intended the cluster for possible needed changes
#-----------------------
##Defining the needed resources with SLURM parameters (modify as needed)
#SBATCH --account=[your-project]
#SBATCH --job-name=iterativeJob
#SBATCH --ntasks=128
#SBATCH --ntasks-per-node=128
#SBATCH --cpus-per-task=1
#SBATCH --exclusive #Will use exclusive access to nodes (Or use --mem for shared access)
#SBATCH --time=05:00:05:00
#-----------------------
##Setting modules
#Add the needed modules (uncomment and adapt the follwing lines)
#module swap the-module-to-swap the-module-i-need
#module load the-modules-i-need
#-----------------------
##Setting the variables for controlling recursion
#job iteration counter. It's default value is 1 (as for the first submission). For a subsequent submission, it will receive it value through the "sbatch --export" command from the "parent job".
: ${job_iteration:="1"}
this_job_iteration=${job_iteration}
#Maximum number of job iterations. It is always good to have a reasonable number here
job_iteration_max=5
echo "This jobscript is calling itself in recursively. This is iteration=${this_job_iteration}."
echo "The maximum number of iterations is set to job_iteration_max=${job_iteration_max}."
echo "The slurm job id is: ${SLURM_JOB_ID}"
#-----------------------
##Defining the name of the dependent script.
#This "dependentScript" is the name of the next script to be executed in workflow logic. The most common and more utilised is to re-submit the same script:
thisScript=`squeue -h -j $SLURM_JOBID -o %o`
export dependentScript=${thisScript}
#-----------------------
##Safety-net checks before proceding to the execution of this script
#Check 1: If the file with the exact name 'stopSlurmCycle' exists in the submission directory, then stop execution.
# Users can create a file with this name if they need to interrupt the submission cycle by using the following command:
# touch stopSlurmCycle
# (Remember to remove the file before submitting this script again.)
if [[ -f stopSlurmCycle ]]; then
echo "The file \"stopSlurmCycle\" exists, so the script \"${thisScript}\" will exit."
echo "Remember to remove the file before submitting this script again, or the execution will be stopped."
exit 1
fi
#Check 2: If the number of output files has reached a limit, then stop execution.
# The existence of a large number of output files could be a sign of an infinite recursive loop.
# In this case we check for the number of "slurm-XXXX.out" files.
# (Remember to check your output files regularly and remove the not needed old ones or the execution may be stoppped.)
maxSlurmies=25
slurmyBaseName="slurm" #Use the base name of the output file
slurmies=$(find . -maxdepth 1 -name "${slurmyBaseName}*" | wc -l)
if [ $slurmies -gt $maxSlurmies ]; then
echo "There are slurmies=${slurmies} ${slurmyBaseName}-XXXX.out files in the directory."
echo "The maximum allowed number of output files is maxSlurmies=${maxSlurmies}"
echo "This could be a sign of an infinite loop of slurm resubmissions."
echo "So the script ${thisScript} will exit."
exit 2
fi
#Check 3: Add some other adequate checks to guarantee the correct execution of your workflow
#Check 4: etc.
#-----------------------
##Setup/Update of parameters/input for the current script
#The following variables will receive a value with the "sbatch --export" submission from the parent job.
#If this is the first time this script is called, then they will start with the default values given here:
: ${var_start_time:="0"}
: ${var_end_time:="10"}
: ${var_increment:="10"}
#Replacing the current values in the parameter/input file used by the executable:
paramFile=input.dat
templateFile=input.template
cp $templateFile $paramFile
sed -i "s,VAR_START_TIME,$var_start_time," $paramFile
sed -i "s,VAR_END_TIME,$var_end_time," $paramFile
#Creating the backup of the parameter file utilised in this job
cp $paramFile $paramFile.$SLURM_JOB_ID
#-----------------------
##Verify that everything that is needed is ready
#This section is IMPORTANT. For example, it can be used to verify that the results from the parent submission are there. If not, stop execution.
#-----------------------
##Submitting the dependent job
#IMPORTANT: Never use cycles that could fall into infinite loops. Numbered cycles are the best option.
#The following variable needs to be "true" for the cycle to proceed (it can be set to false to avoid recursion when testing):
useDependentCycle=true
#Check if the current iteration is within the limits of the maximum number of iterations, then submit the dependent job:
if [ "$useDependentCycle" = "true" ] && [ ${job_iteration} -lt ${job_iteration_max} ]; then
#Update the counter of cycle iterations
(( job_iteration++ ))
#Update the values needed for the next submission
var_start_time=$var_end_time
(( var_end_time += $var_increment ))
#Dependent Job submission:
# (Note that next_jobid has the ID given by the sbatch)
# For the correct "--dependency" flag:
# "afterok", when each job is expected to properly finish.
# "afterany", when each job is expected to reach walltime.
# "singleton", similar to afterany, when all jobs will have the same name
# Check documentation for other available dependency flags.
#IMPORTANT: The --export="list_of_exported_vars" guarantees that values are inherited to the dependent job
next_jobid=$(sbatch --parsable --export="job_iteration=${job_iteration},var_start_time=${var_start_time},var_end_time=${var_end_time},var_increment=${var_increment}" --dependency=afterok:${SLURM_JOB_ID} ${dependentScript} | cut -d ";" -f 1)
echo "Dependent with slurm job id ${next_jobid} was submitted"
echo "If you want to stop the submission chain it is recommended to use scancel on the dependent job first"
echo "Or create a file named: \"stopSlurmCycle\""
echo "And then you can scancel this job if needed too"
else
echo "This is the last iteration of the cycle, no more dependent jobs will be submitted"
fi
#-----------------------
##Run the main executable.
#(Modify as needed)
#Syntax should allow restart from a checkpoint
srun -N $SLURM_JOB_NUM_NODES -n $SLURM_NTASKS -c $SLURM_CPUS_PER_TASK ./code.x |
|
...
Column |
---|
|
Code Block |
---|
language | bash |
---|
theme | Emacs |
---|
title | Listing 8. Parallel serial jobs example |
---|
| #!/bin/bash --login
#This example use general SBATCH settings, but please refer to the specific guide
#of intended the cluster for possible needed changes
# SLURM directives
#
# Here we specify to SLURM we want 64 tasks in a single node with
# a wall-clock time limit of 1 hour (--time=01:00:00).
#
# Replace [your-project] with the appropriate project name
# following --account (e.g., --account=project123).
#SBATCH --account=[your-project]
#SBATCH --nodes=1
#SBATCH --ntasks=64
#SBATCH --ntasks-per-node=64
#SBATCH --ntasks-per-socket=64
#SBATCH --cpus-per-task=1
#SBATCH --mem=117G #Needed memory per node when share access (or use --exclusive for exclusive access)
#SBATCH --time=01:00:00
#---
#Specific settings for the cluster you are on
#(Check the specific guide of the cluster for additional settings)
#---
# Launch 64 instances of wrapper script (make sure it's executable),
srun -N 1 -n 64 -c 1 -m block:block:block ./wrapper.sh
|
|
...
Column |
---|
|
Code Block |
---|
language | bash |
---|
theme | Emacs |
---|
title | Listing 10. Serial Java example |
---|
| #!/bin/bash --login
#This #example SLURMuse directivesgeneral #SBATCH #settings, Herebut weplease specifyrefer to SLURMthe wespecific wantguide
one#of node (--nodes=1) with
# intended the cluster for possible needed changes
# SLURM directives
#
# Here we specify to SLURM we want one node (--nodes=1) with
# a wall-clock time limit of 1 hr (--time=01:00:00).
#
# Replace [your-project] with the appropriate project name
# following --account (e.g., --account=project123).
#SBATCH --account=[your-project]
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=4G #specify when asking for shared access to compute nodes (or use --exclusive for exclusive access)
#SBATCH --time=00:10:00
# Launch the job.
# There is one task to run java in serial (-n 1).
srun -N 1 -n 1 -c 1 java Application |
|
...
Column |
---|
|
Code Block |
---|
language | bash |
---|
theme | Emacs |
---|
title | Listing 11. Dual Java instances example script |
---|
| #!/bin/bash --login
#This example use general SBATCH settings, but please refer to the specific guide
#of intended the cluster for possible needed changes
# Here we specify to SLURM we want one node (--nodes=1) with
# a wall-clock time limit of ten minutes (--time=00:10:00).
#
# Replace [your-project] with the appropriate project name
# following --account (e.g., --account=project123).
#SBATCH --account=[your-project]
#SBATCH --nodes=1
#SBATCH --ntasks=2 #this directive is required on setonix to request 2 tasks
#SBATCH --cpus-per-task=1
#SBATCH --mem=8G3680M #specify when asking for shared access to compute nodes (or use --exclusive for exclusive access)
#SBATCH --time=00:10:00
#---
#Specific settings for the cluster you are on
#(Check the specific guide of the cluster for additional settings)
# We request two instances "-n 2" to be placed on cores 0 and 12 "--cpu_bind=map_cpu:0,12"the node
srun -N 1 -n 2 -c 1 --cpu_bind=map_cpu:0,12 java Wrapper |
|
The Wrapper.java
application takes the form. Two instances of the Wrapper class are run (asynchronously)) which will be identical except for the value of SLURM_PROCID obtained from the environment. Appropriate program logic can be used to arrange, for example, specific input to an instance of an underlying application. Here, we simply report the value of SLURM_PROCID to standard output.
...