Excerpt |
Conda creates |
---|
This page discusses how to use SquashFS and overlays and /tmp to manage many small files generated by some workflows and tools, including Conda. |
...
An example of a good use case for this approach might be a Nextflow workflow. Nextflow is a workflow engine for creating scalable, portable, and reproducible workflows. It is not uncommon to use Nextflow to manage running hundreds or thousands of processes. For each process, working directories are created along with log files and output. The large number of files and directories of small size is not optimally suited for Lustre and can run into quota issues.
Column |
---|
|
Warning |
---|
title | Clean up /tmp when finished |
---|
| Please ensure you remove all your files from /tmp when you are finished. Even if your job fails, make sure you clean up any remaining files afterwards. |
|
Using mksquashfs
The following example shows how a workflow could use /tmp
and mksquashfs
to alleviate issues with file quota's and poor performance.
Column |
---|
|
Code Block |
---|
language | bash |
---|
theme | Emacs |
---|
title | Sbatch script using squashfs |
---|
linenumbers | true |
---|
| #!/bin/bash
#SBATCH -p <project>
#SBATCH -A <account>
#SBATCH --nodes=1
#SBATCH --exclusive
# other resource requests
# load modules as required
module load squashfs/4.6.1
# create a working directory and move to it
# this can be the base working directory for a nextflow job.
mkdir -p /tmp/${USER}/${SLURM_JOB_ID}
cd /tmp/${USER}/${SLURM_JOB_ID}
# run a command with output on /tmp
srun <cmd> <args>
# run it on output
mksquashfs <input_dir> my_squash.sqsh -keep-as-directory
# and copy it to /scratch or to acacia
mv my_squash.sqsh $MYSCRATCH/
cd $MYSCRATCH
rm -rf /tmp/${USER}/${SLURM_JOB_ID} |
|
...
Nextflow in the course of its operation creates links to files that are carried through the process steps. Depending on the number of files that can slow down the workflow or even take you over the default limit of files on '/scratch' if they are all written there. For very small files such as symbolic links, you could use '/tmp' to avoid that issue, bearing in mind that '/tmp' is very limited in space and needs to be cleaned up after use so as not to affect other users.The location where Nextflow creates all links and temporary files is under the 'work' directory and changing the location in a fe