Deleting Large Numbers of Files on Lustre Filesystems

To delete a large volume of files on /scratch or /software use munlink, a Lustre-specific command that will simply delete without performing a stat() operation.

Deletion of files is a permanent action

Deletion of files in Linux system is a non-recoverable action. Take care before executing any deletion command.

rm may overload the metadata server when deleting a large number of files

Using the standard Linux command rm to delete multiple files on a Lustre filesystem is not recommended. The rm command will generate a stat() operation on for each file it removes, meaning all of the attributes of a file (filetype, owner, permission, modification time, etc.) will be returned from the metadata server. A large number of stat() operations can place an increased load on the metadata server, resulting in lower performance and instabilities with the filesystem.

Below is an example of our recommended two-step approach:

0. Set-up step: Open an interactive session on the copy partition or prepare a script to be submitted to the copy partition and make use of the commands described below.

I. First step: use the munlink command to delete all the files and soft links within a directory and its subdirectories (munlink deletes the files and links previously found by the find command):

$ find -P ./processor0 -type f -print0 -o -type l -print0 | xargs -0 munlink

Here is an overview of each step in that command:

  • find 
    This command will search the indicated directory (and subdirectories within). The syntax defines a search for files and soft links.

  • -P
    This option restricts the search within the indicated directory tree and forces NO dereference of symbolic links. This warranties that the find command will not look for files within the links.

  • ./processor0
    This argument is the directory on which the search (and deletion) will be performed.

  • -type f -o -type l
    These options indicate that the find command will search for anything that is a file (-type f) or (-o) a soft link (-type l, this is the lower letter l). As indicated with the -P option above, the links are not followed, so only the links will be removed but the object to where they linked are not.

  • -print0
    This option indicates the format of the result of the "find" command. This particular format is able to catch strange file names, and ensures that they are readable for the following command  (xargs) which has been concatenated with the pipe. Note that two -print0 indications are needed, one per "side" of the "or" option indicated above.

  • The pipe command ( represented by a sinngle pipe line: |)
    This command concatenates two commands. This makes the output of the previous command (find) to serve as input to the following command (xargs).

  • xargs -0
    xargs will then convert the received list of files, line by line, into an argument for whatever command is specified to it (munlink in this case). The -0 flag is related to the format of the listed files; if you use -print0 in the find command you must use -0 in the xargs command.

  • munlink
    This command deletes each file and soft link in the list without overloading the metadata server. In this case, the list is the one received by xargs.


II. Second step: remove the empty directories and subdirectories in the tree. Once all of the files and soft links are deleted, you can remove the empty directories with a similar command:

$ find -P ./processor0 -type d -empty -delete

Again, the find command will search the directory processor0 and all subdirectories for any empty directories (-type d -empty) and delete them. The -delete action deletes the empty directories that have been found. The -delete option uses the -depth option implicitly (so the -depth option does not need to be explicitly given to this find command). The -depth option (used implicitly here) instructs process of each directory's contents before the directory itself, then the most distant branches in the directory tree will be processed first.


Use the given order of the options

Flags passed to find are evaluated as an expression, so if you pass -delete before -type d -empty, find will attempt to delete EVERYTHING below your starting directory and not only the empty directories, as the rest of the options are interpreted afterwards.

Always use munlink to delete simulation output

Always use munlink to delete directories with a large number of files. Doing so will not only help keep a stable filesystem, but also run faster than using the standard rm -rf command.