Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Conda creates 
Excerpt

This page discusses how to use SquashFS and overlays and /tmp to manage many small files generated by some workflows and tools, including Conda. 

...

An example of a good use case for this approach might be a Nextflow workflow. Nextflow is a workflow engine for creating scalable, portable, and reproducible workflows. It is not uncommon to use Nextflow to manage running hundreds or thousands of processes. For each process, working directories are created along with log files and output. The large number of files and directories of small size is not optimally suited for Lustre and can run into quota issues. 


Column
width900px


Warning
titleClean up /tmp when finished

Please ensure you remove all your files from /tmp when you are finished. Even if your job fails, make sure you clean up any remaining files afterwards. 



Using mksquashfs

The following example shows how a workflow could use /tmp  and mksquashfs to alleviate issues with file quota's and poor performance. 

Column
width900px


Code Block
languagebash
themeEmacs
titleSbatch script using squashfs
linenumberstrue
#!/bin/bash
#SBATCH -p <project>
#SBATCH -A <account>
#SBATCH --nodes=1
#SBATCH --exclusive 
# other resource requests

# load modules as required
module load squashfs/4.6.1

# create a working directory and move to it
# this can be the base working directory for a nextflow job.
mkdir -p /tmp/${USER}/${SLURM_JOB_ID}
cd /tmp/${USER}/${SLURM_JOB_ID}

# run a command with output on /tmp
srun <cmd> <args> 

# run it on output
mksquashfs <input_dir> my_squash.sqsh -keep-as-directory

# and copy it to /scratch or to acacia
mv my_squash.sqsh $MYSCRATCH/

cd $MYSCRATCH
rm -rf /tmp/${USER}/${SLURM_JOB_ID}



...

Nextflow in the course of its operation creates links to files that are carried through the process steps. Depending on the number of files that can slow down the workflow or even take you over the default limit of files on '/scratch' if they are all written there. For very small files such as symbolic links, you could use '/tmp' to avoid that issue, bearing in mind that '/tmp' is very limited in space and needs to be cleaned up after use so as not to affect other users.The location where Nextflow creates all links and temporary files is under the 'work' directory and changing the location in a fe