This page discusses how to use SquashFS and overlays and /tmp to manage many small files generated by some workflows and tools, including Conda.

The Lustre file system does not work well with millions of tiny files and is instead optimised for fast parallel access to large files. To ensure system stability, we have implemented quotas on the number of files a user and a project can produce. This can become problematic for some tools and workflows. Here, we provide a solution that allows users to hide their thousands or millions of files in a single SquashFS image, which reduces the file count on Lustre. This can also be used for Conda/Mamba builds. Once the SquashFS image is created, you can easily store, move and access it without putting straining on the Lustre filesystems /scratch and /software. See Pawsey Filesystems and their Use for a discussion on quotas.

Creating a SquashFS image

SquashFS is a read-only file system that we will use to compress single directories on Setonix to later mount and access with Singularity. For archiving purposes, SquashFS gives you a lot more flexibility and performance speed than a tarball archive. More information can be found here.

Using SquashFS can be ideal for workflows or tools that produce large number of files. The mksquashfs command can be used to produce a single image on the Lustre filesystems that can be easily moved to Acacia object storage. This example will demonstrate how to generate a SquashFS image.

Terminal 1: Example of squashfs

# Load the SquashFS module. Note the version may change 
$ module load squashfs/4.6.1

# This command creates a SquashFS image of the directory where our files reside (i.e. $MYSCRATCH/<result_dir>/<base_dir_in_squashfs>)
$ mksquashfs $MYSCRATCH/<result_dir>/<base_dir_in_squashfs> <squash_file>.sqsh -keep-as-directory

# Now all the files from $MYSCRATCH/<result_dir>/<base_dir_in_squashfs> are in a single SquashFS image on Lustre
# Remove $MYSCRATCH/<result_dir>/<base_dir_in_squashfs>
$ rm -rf $MYSCRATCH/<result_dir>/<base_dir_in_squashfs>

# Can move the data to Acacia via rclone
$ module load rclone/1.63.1
$ rclone copy <squash_file>.sqsh <acacia_alias>:<bucket>

Reading your SquashFS object with Singularity

The SquashFS image you just created can be mounted as read/writable directories with Singularity. SquashFS images cannot be directly mounted on Lustre, but Singularity allows us to access the files inside the SquashFS image through the --overlay functionality. We can use a very simple Ubuntu container for this purpose. There is no need to write your own container.

The example below demonstrates how to interact with the data stored in a Squashfs image.

Terminal 2: Using Squashfs with Singularity overlay

# Load the SquashFS and Singularity modules. Note the versions may change 
$ module load squashfs/4.6.1 singularity/4.1.0-nompi 

# Pull a very basic ubuntu container to Setonix
$ singularity pull docker://ubuntu:latest

# Now load the file using singularity. Here the squashfs image mounts a directory called test1-squash in the / directory
$ singularity exec --overlay <squash_file>.sqsh ubuntu_latest.sif ls /

bin           dev           home          media         proc          sbin          singularity   sys           usr
codes         environment   lib           mnt           root          scratch       software      test1-squash  var
data          etc           lus           opt           run           scripts       srv           tmp

# Let's copy the test1-squash directory to /scratch 
# Note that this will add possibly lots of files to scratch and impact your quota
$ singularity exec --overlay <squash_file>.sqsh ubuntu_latest.sif cp -r /test1-squash $MYSCRATCH/
$ ls $MYSCRATCH
test1-squash/

# You can also copy just portions of the directory. 
# Let's assume the directory contains several runs with naming convention run-0001, run-0002.
# Instead of copying over everything, you can copy just the run you are interested in. 
# This approach will all you to better manage your quota
$ singularity exec --overlay <squash_file>.sqsh ubuntu_latest.sif cp -r /test1-squash/run-0001 $MYSCRATCH/
$ ls $MYSCRATCH
run-0001/

Workflows using /tmp and SquashFS

Here we demonstrate another way to work around the issues created by writing many small files by using /tmp and SquashFS. The /tmp filesystem on a compute node is local storage that makes use of node memory. This storage is very fast, and any files written here do not count against your quota.

There are caveats:

Filling this space reduces the amount of available memory for running processes.
It is not shared between nodes so is not suited to multi-node MPI jobs than need to see the same storage space.
It is a shared resource for all jobs running on the node so it should be used with care. We recommend you use nodes with --exclusive flag so as to not impact other users.
It is a limited resource. On standard work nodes, the total available memory for running processes and writing to /tmp is 230GB. We recommend using this storage only if the required storage space for the workflow is less than 200 GB.

An example of a good use case for this approach might be a Nextflow workflow. Nextflow is a workflow engine for creating scalable, portable, and reproducible workflows. It is not uncommon to use Nextflow to manage running hundreds or thousands of processes. For each process, working directories are created along with log files and output. The large number of files and directories of small size is not optimally suited for Lustre and can run into quota issues.

Clean up /tmp when finished

Please ensure you remove all your files from /tmp when you are finished. Even if your job fails, make sure you clean up any remaining files afterwards.

Using mksquashfs

The following example shows how a workflow could use /tmp and mksquashfs to alleviate issues with file quota's and poor performance.

Sbatch script using squashfs

#!/bin/bash
#SBATCH -p <project>
#SBATCH -A <account>
#SBATCH --nodes=1
#SBATCH --exclusive 
# other resource requests

# load modules as required
module load squashfs/4.6.1

# create a working directory and move to it
# this can be the base working directory for a nextflow job.
mkdir -p /tmp/${USER}/${SLURM_JOB_ID}
cd /tmp/${USER}/${SLURM_JOB_ID}

# run a command with output on /tmp
srun <cmd> <args> 

# run it on output
mksquashfs <input_dir> my_squash.sqsh -keep-as-directory

# and copy it to /scratch or to acacia
mv my_squash.sqsh $MYSCRATCH/

cd $MYSCRATCH
rm -rf /tmp/${USER}/${SLURM_JOB_ID}

Using Singularity Containers and Overlays

Singularity can produce a persistent file overlay using SquashFS under the hood (see Managing Files with Singularity Overlays for a discussion related to production of files by executables within the container). A similar approach can be followed to deploy a conda environment and the basic idea is similar to the SquashFS method. However, here the size of the file is set at the start.

Here we demonstrate using Squashfs files produced by singularity as overlays for running container based workflows. The example is similar to the above but the squashfs is produced on scratch before running the container, and all files produced by the container are to be written to the overlay and thus not impact the file count quota. This may require creating specific output directories in the overlay

Sbatch script using squashfs

#!/bin/bash
#SBATCH -p <project>
#SBATCH -A <account>
#SBATCH --nodes=1
#SBATCH --exclusive 
# other resource requests

# load modules as required
# load singularity
module load singularity/4.1.0-nompi

# create overlay 
SIZE="5000"
FILE="my_overlay.sqsh"
singularity overlay create --size $SIZE ${MYSCRATCH}/$FILE
# create an output directory on the overlay
singularity exec --overlay ${MYSCRATCH}/$FILE <container> mkdir -p /data/output

# Note if running several MPI ranks, may need to create a overlay for each rank. 
# You might also have to manage to ensure no two ranks access the same squashfs at the same time

# run a command with output on the overlay
srun singularity exec --overlay ${MYSCRATCH}/$FILE <container> <cmd> <args>

Conda (Mamba) software installations using overlay with Singularity or SquashFS

Conda is often used to install software. However, the installation process can produce a large quantity of files and exceed the file quota on /software. To help alleviate this issue, we suggest using the Singularity container engine or SquashFS to encapsulate all the Conda build files into a single SquashFS image on the Lustre filesystem. This approach can be adapted to other software deployments that may contain many files (e.g. Spack).

Squashfs

This example shows how to generate a SquashFS file in an interactive salloc session using /tmp to improve installation speed and avoid impacting your file quota. For our example, we use Mamba, a fast implementation of Conda that we recommend. It's a drop-in replacement for Conda, so if you know how to use Conda, you know how to use Mamba. If you wanted to use Conda, you would follow the same guidelines below but install miniconda instead of mamba, and make sure to replace 'mamba' with 'conda' as appropriate.

Terminal 3: Installing Conda in Squashfs

# request a single debug node. 
$ salloc --nodes=1 --ntasks-per-node=1 --exclusive --partition=debug

# load the SquashFS module and singularity module. Note that versions may change
$ module load squashfs/4.6.1 singularity/4.1.0-nompi

# Create a directory for the conda environment on the node memory. 
# If you are doing multiple builds, it might be better to use $MYSCRATCH to build software
$ mkdir -p /tmp/${SLURM_JOB_ID}/mamba-build
$ cd /tmp/${SLURM_JOB_ID}/mamba-build

# download and install miniforge's Mamba 
$ wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh
$ bash Miniforge3-Linux-x86_64.sh -b -p $(pwd)/opt/mamba
$ rm Miniforge3-Linux-x86_64.sh

# Next, create a wrapper script /opt/mamba/env.sh 
# The wrapper script will activate your mamba environment
cat > ./env.sh <<EOF 
#!/bin/bash 
source /tmp/${SLURM_JOB_ID}/mamba-build/opt/mamba/etc/profile.d/conda.sh 
export PATH=/tmp/${SLURM_JOB_ID}/mamba-build/opt/mamba/bin:$PATH 
export PYTHONPATH=/tmp/${SLURM_JOB_ID}/mamba-build/opt/mamba/bin:$PATH 
EOF 

# Activate your mamba (conda) environment with the following:
$ source ./env.sh

# Now that your environment is activated, you can update and install packages
$ mamba update -n base conda -y
$ mamba install <packages> --yes
$ mamba deactivate 
$ mamba clean --all --yes

# now make the SquashFS file such that it will appear as /opt/mamba/
$ FILE="my_mamba.sqsh"
$ mksquashfs opt ${FILE} -keep-as-directory

# Move the SquashFS image to a save place and exit /tmp directory
$ mv $FILE $MYSCRATCH
$ cd $MYSCRATCH

# Clean up /tmp
$ rm -rf /tmp/${SLURM_JOB_ID}/mamba-build/opt/

# Exit interactive allocation
$ exit

To use this file you can use singularity and any appropriate container.

Singularity overlay

A similar approach can be followed to deploy a conda environment using overlays. However, here the size of the file is set at the start. This example shows how to generate an overlay of a fixed size in an interactive salloc session using the /tmp filesystem to improve the performance.

Terminal 4: Installing Conda in an Overlay

# Request a single debug node. 
$ salloc --nodes=1 --ntasks-per-node=1 --exclusive --partition=debug

# Load singularity. Note versions may change
$ module load singularity/4.1.0-nompi

# Create a directory for the mamba environment in /tmp. Improves the speed.
$ mkdir -p /tmp/${SLURM_JOB_ID}/mamba-build
$ cd /tmp/${SLURM_JOB_ID}/mamba-build

# Create an overlay with Singularity, deciding on the size and the name. 
# Decide on the max size of the overlay in MB. This example is 5000MB. The default is 100MB.
# For conda, it might be necessary to have >= 5GB
$ SIZE="5000"
$ FILE="my_overlay.sqsh"
$ singularity overlay create --size $SIZE $FILE

# Choose a corresponding singularity image. 
# We will use the Pawsey mpich image but other images can be used as well.
$ singularity pull docker://quay.io/pawsey/mpich-Lustre-base:3.4.3_ubuntu20.04_lustrerelease

# Start the container, also loading the overlay
$ singularity shell --overlay $FILE:rw <container>

# Prepare for mamba download
# Create a directory within the / (root) of the overlay
$ mkdir -p /opt/mamba
$ cd /opt/mamba

# Download and install mamba in the overlay
$ wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh
$ bash Miniforge3-Linux-x86_64.sh -b -p $(pwd)/opt/mamba
$ rm Miniforge3-Linux-x86_64.sh

# Create a wrapper script /opt/mamba/env.sh  
# The wrapper script will activate your mamba environment
cat > ./env.sh <<EOF 
#!/bin/bash 
source /tmp/${SLURM_JOB_ID}/mamba-build/opt/mamba/etc/profile.d/conda.sh 
export PATH=/tmp/${SLURM_JOB_ID}/mamba-build/opt/mamba/bin:$PATH 
export PYTHONPATH=/tmp/${SLURM_JOB_ID}/mamba-build/opt/mamba/bin:$PATH 
EOF 
 
# Activate your mamba environment with the following:
$ source /opt/mamba/env.sh

# Now that your environment is activated, you can update and install packages
$ mamba update -n base conda -y
$ mamba install <packages> -y
$ mamba clean --all --yes

# exit the container
$ exit

# Cleanup /tmp
$ cd $MYSCRATCH
$ rm -rf /tmp/${SLURM_JOB_ID}/mamba-build

This approach can run into issues if you do not specify a large enough overlay size. In that case you will have to create another overlay of larger size. SquashFS doesn't require you to specify the size beforehand, which is why we recommend it.

Running software contained in SquashFS

The SquashFS image, either produced directly with SquashFS or using singularity overlay create , can be loaded into a container for use. This container does not be tailored to Setonix if the software will not use MPI. This example shows how you could use the software. This may require first setting some paths and environment variables. We have provided an example below:

Job batch script running squashed software deployment

# Load singularity. Note versions may change
# Can also use the -nohost version of singularity if want to ensure isolation from host libraries. 
$ module load singularity/4.1.0-nompi

# Download an appropriate container. We will use the Pawsey mpich image but other images can be used as well.
$ singularity pull docker://quay.io/pawsey/mpich-lustre-base:3.4.3_ubuntu20.04_lustrerelease

# You may need to add executables to your PATH. Here is an example of how you might do that for Mamba
$ export SINGULARITYENV_PREPEND_PATH="/opt/mamba/bin"

# Then you can run the command
$ singularity exec --overlay <squash_file>.sqsh <container> <cmd> <args>

Using /tmp for Nextflow workflows

Nextflow in the course of its operation creates links to files that are carried through the process steps. Depending on the number of files that can slow down the workflow or even take you over the default limit of files on '/scratch' if they are all written there. For very small files such as symbolic links, you could use '/tmp' to avoid that issue, bearing in mind that '/tmp' is very limited in space and needs to be cleaned up after use so as not to affect other users.

User Support Documentation

How to avoid Conda breaking your file quota

Analytics